The XEN30 node started showing some issues with dom0 showing stress from arcconf requests stalling on the RAID card.
12.40am: The server was rebooted but is not coming back on line. It's not booting yet. We may need to replace the RAID card.
1.10am: The data centre remote hands have been asked to hook up a recovery session so we can see why the server is not booting from the drive. The RAID array itself shows optimal and healthy.
1.55am: We're still waiting for the data centre to remove the KVM and onvert it to a recovery session. We'll be working on this the moment they do this.
3.00am: Our senior systems admin is and has been working on this server. There appears to be no partition table on the drive. We're trying to recover this. Please bear with us. As a precaution those clients on this node who have backups we have started building new servers and have started a restore just as a precaution and to speed things up should the server be not able to be recovered.
7.00am Our senior system administrator has been trying all night to get the array back on line. We're currently having the data centre attach a Debian Live CD to the server and we are going to try one last time to get the Logical Volumes to mount so we can salvage data from the array.
There are 7 clients on this node.
Three are managed clients and have backups we are restoring now
One is a SolusVM Control machine for another node so not service affecting and will be rebuilt later today (we can handle their requests via ssh commands in the meantime)
One is a Web server from a High Availability Cluster. This is lower priority as the same data is on multiple other servers and is non service affecting
One is a mail server with no backups purchased from us. We'll help this client restore any local backups they have and will work hard to retrive any data we can for them
At this point no more updates will be posted here we are opening individual tickets for the 7 affected clients.