Hello at 4.10pm
This is a courtesy update. We have been working all day to resolve these lingering issues that were caused by the new cpanel update overnight. We believe all isues are resolved. Any servers that still have issues we simply need to force an upgrade to cpanel.
We have also been speaking to the management of our remote support company and other clients they support have had similar issues. Hopefully we are over the worst of it now and all services are running normally. We had a few recent reports that Cpanel File Manager is not working but Legacy File Manager is. If you notice file manager is not working and you need it please open a ticket and we can troubleshoot and please use Legacy File Manager in the meantime.
============================================
Hello at 10.09am
We have a policy where we are always up front and totally honest with clients. Here is what happened this morning.
We are working through these issues one by one and most have now been resolved. There is still an elevated load across many servers but average load and I/O wait is reducing at this time.
This issue appears to have been caused by a two fold process causing things to get progressively worse with time. First off all our backups run overnight and this in and of itself is not an issue. The system is designed to allow backups to run overnight without issue when server loads are lower. Last night the Cpanel automated update kicked off apparently at the same time across multiple servers. We also noticed this morning a number of servers were also running yum updateds processes. We believe the combination of Cpanel Updates and Off Server Backups running at the same time caused a number of servers load to spiral out of control causing issues.
The issue appeared worse on those servers on the Cloud. This is to be expected as all our cloud servers share the same Network SAN so when multiple servers were all updating cpanel at the same time as well as copying backups off the San to Backup servers at the same time as well as OnApp Cluster Backups being run as well things just spiralled out of control. The SAN is designed to handle a lot of I/O traffic but it appears the combination of many Cpanel Updates with Backup caused these issues. Things are becoming stable now.
Moving forward we will make sure we disable automatic cpanel udpates on these servers and moving forward we will plan a manual upgrade when nothing else is happening on the server and critically we will do one at a time not 35 at a time as happened today across many shared servers on the cloud and client cloud servers.
Do open a ticket if you have any questions or concerns.
============================================
Hello at 9.35am
We are seeing multiple issues across multiple servers this morning. At the moment we are unsure what is happening. There is NO hardware faults and the issues all seem to be related to Cpanel performing an automatic update overnight on multiple servers. Here is what we are seeing: