We had a power trip that took out the switch to one cabinet and several servers. That also resulted in other servers behind that switch becoming inaccessible to network traffic. Most servers were not affected by the issue.
Originally posted by msgeek
So lemme get this straight...when Zeus goes down, all DNS info is lost????
No, that is not what I posted to our remote forums, and has nothing at all to do with the incident first posted here. What I posted was that the primary (as opposed to secondary) DNS server's named config was wiped out. Why this is being brought up here in this thread is a complete mystery, as it is completely unrelated to power issues.
You guys need REDUNDANT DNS. In fact, one of your DNS servers should be in another place other than where you have your servers co-located. That's just basic BEST PRACTICES.
Any ETA on when this all will be fixed so I can get my email?
Well, if you had contacted us directly, we would have informed you that we do in fact have redundant DNS servers set up, and secondary services are on another server within the same facility - secondary services which, by the by, were operating just fine. The issue with the named config zeroing itself out is directly related to the cPanel clustering function, something I verified this morning and about which we contacted cPanel. This could also be confirmed by other hosts who have seen exactly the same thing happen on occasion. Your "best practices" item is not always true, not always efficient, and certainly not always practical when overall considerations are taken into account as it relates to everything used in our day to day lives (like cPanel).
This incident affected a handful of people, not everyone. No need to spread misinformation about our setup when it's simple enough to contact us.
All I know is what I could see from Nagios. If I'm mistaken about your setup, I'm terribly sorry. There was no way to contact you about the problem except through this board, being that there is no alternative way of contacting you other than through the forums or email, and both were down. Indeed, as I mentioned in my followup post, the problem was indeed resolved.
Perhaps if there was some other way of contacting you, or at least getting a status report, like a recorded message I could call when things went down, I would be less likely to freak out when I can't get my email. I was waiting for an important email from the college I attend. Luckily it eventually got through to me.
There is an offnework forum set up on the same system as the nagios status page for precisely the instances when the main forums or helpdesk are unavailable. It is linked in my initial response to this thread:
These forums are also linked in the left navigation bar within the main nagios status page, via the item "Emergency Forums".
Email that can't be immediately delivered is automatically retried by almost every mail server in existence if an issue is encountered. It is therefore unsurprising that it was properly delivered.
WHT is not a support area for hosts, although some do make it appear that way. We have remote forums for a reason, just as we have remote status areas for a reason. The one thing we certainly do not lack is communication with our clients.