pixd
09-08-2007, 03:59 PM
I hate to rag on a company, but I am extremely frustrated with HostGator and how they handled my major server outage. I decided to take my sites to HostGator about 3 months ago when looking for a high bandwidth dedicated server for the right specs and price. I found out about them through the many recommendations on WebHostingTalk and rarely see anything negative posted about them other than they oversell.
The first little bit was absolutely perfect service, but the last 2 Fridays were a nightmare. 2 weeks ago on Friday, August 24th, my entire server went down. I immediately issued a reboot request and it took roughly 1-2 hours to sort itself out and come back up. I was fine with that, but the issues I experienced on Friday, September 7th taught me a valuable lesson.
Lesson being: Don't host websites/forums that are somewhat popular on dedicated server plans that are sold through resellers unless you want to deal with extreme delays and poor technical support quality when there are serious issues. Dealing with the middleman when your business is offline is a very frustrating and stressful experience.
HostGator has a really active support staff, the problem lies in the fact that most of their ticket updates lack substance and they keep you totally out of the loop unless you nag them for progress reports. Below is a timeline of the course of events, a few times not recorded by the trouble ticket system are `best guess`.
04:19 PM - Server becomes inaccessible / crashes
04:29 PM - I request a reboot from hostgator.com/reboot.php
04:34 PM - Reboot request acknowledged by HostGator
04:45 PM - Server comes back online
04:49 PM - Server becomes inaccessible / crashes for the 2nd time
04:55 PM - Reboot requested from hostgator.com/reboot.php for the 2nd time
04:56 PM - [Billy Harrison] replies to the original reboot request saying my site is online (it's not).
05:10 PM - Server still offline, Live Support contacted to assist with the matter, they tell me to open a trouble ticket.
05:15 PM - Trouble ticket opened detailing the above events, and requesting an investigation / resolution.
06:29 PM - [Matthew Feinberg] replies to the ticket saying he's issued a reboot request.
07:05 PM - Server is still offline. I reply to the ticket requesting an update.
07:14 PM - [David Ibarra] replies saying he rebooted again, and my server's load was high on reboot.
07:25 PM - [David Ibarra] replies saying that the server was undergoing a small DDOS attack and that it was routed and the server's back online.
07:54 PM - Server becomes inaccessible / crashes for the 3rd time.
08:00 PM - I reply to the ticket saying it's down again, and since it was a "ddos" not much could be done to prevent it but I would appreciate their assistance in bringing it back up.
09:32 PM - [David Ibarra] replies the DDOS might have been a coincidence (aka: he was wrong and we just lost 2 hours), and that he now suspects hard disk failure, he wants to schedule a hard disk scan at "midnight".
- At this point it's been over 5 hours of running in circles trying to reboot the server over and over, and he now finally suspects a hardware issue and wants to wait 3 hours to do a scan. -
10:03 PM - I acknowledge the scan and request to be updated when it starts.
10:45 PM - [David Ibarra] informs me he's scheduled the scan and that I'll be kept up to date.
01:01 AM - It's been over an hour since the scan was supposed to start and there wasn't an update, so I replied requesting one.
01:35 AM - [Nate Custer] replies saying that their timezone is CST (so the scan actually took 4 hours to start). Since there were no time zones listed by David, I assumed he meant EST, the timezone followed by the ticket system. Fair enough though.
04:04 AM - I reply requesting an update as it's been 2 hours 30 minutes since the last one.
04:34 AM - [Shashank Wagh] replies saying that it should be done any time now.
05:31 AM - "any time now" has come and passed, I reply requesting an update again.
06:39 AM - [Shashank Wagh] replies a couple minutes after I go back into Live Chat to ask what's going on. He says my diagnostics scan is now testing the hard drives.
08:11 AM - There are no updates to the ticket. I reply requesting an update yet again.
08:30 AM - I decide to take a 3 hour nap here because ive been awake ever since the issue started, trying to get this server back up
08:50 AM - My server is online
08:56 AM - [Matthew Lee] reports to the ticket that it's online.
12:14 PM - I respond to the ticket acknowledging that the server is online. I ask for specifics on the issue, and what HostGator plans to do to keep my business after the 16 hour outage (many companies offer a monthly credit or a prorated discount on next month's fee to compensate for extended outages).
12:28 PM - [David Ibarra] responds with the following:
"There were some hardware issues that were corrected on your server. However even though the server is a managed server, it's your responsibility to keep it running; we don't monitor dedicated servers since the server belongs to you. We will fix the server if you need help or if there's a hardware problem but hardware issues happen in any hosting environment."
12:44 PM - At this point I'm absolutely livid. Not only did he dance around the extremely important question of what exactly went wrong - he had the audacity to try and put the responsibility of this outage on my shoulders. I responded in kind,
"David,
Don't give me that. It's your responsibility to identify the problem as a hardware issue after it's reported to you, and take prompt action to resolve it as my dedicated hosting provider that lives hundreds of miles away. Since I can't physically walk to the server and run tests it's obviously impossible for me to say "Oh yes, this is a hard disk failure".
I came here with the problem, and it was after about 5 hours of tail chasing doing "reboot, reboot, reboot" that you realized it might be hardware. A downtime of 15+ hours for a hardware issue is a really poor level of service, telling me "it happens", is highly unprofessional and ignorant. Your team acted uninterested in my concerns and kept me so far out of the loop that I had to nag all afternoon, evening and night long to get updates. At the end of the day all I get is "Oh there were some hardware issues that were corrected". Throw me a bone man, I wanted to know exactly what was wrong with the server and what was done to remedy it so I have confidence that something was actually done and not a hardware diagnostics that came back clean, and the guy just plugged my server back in."
His initial reply was 14 minutes after mine, and it's now been around 4 hours since my last reply and they haven't touched on it yet so I'm done with it, and I'm guessing they are too. I know things like this are common in the hosting industry. I'm just 1 man in with a couple big websites that were down for almost an entire day and didn't even get an acceptable reason why. I hope this log helps you, reader.
The first little bit was absolutely perfect service, but the last 2 Fridays were a nightmare. 2 weeks ago on Friday, August 24th, my entire server went down. I immediately issued a reboot request and it took roughly 1-2 hours to sort itself out and come back up. I was fine with that, but the issues I experienced on Friday, September 7th taught me a valuable lesson.
Lesson being: Don't host websites/forums that are somewhat popular on dedicated server plans that are sold through resellers unless you want to deal with extreme delays and poor technical support quality when there are serious issues. Dealing with the middleman when your business is offline is a very frustrating and stressful experience.
HostGator has a really active support staff, the problem lies in the fact that most of their ticket updates lack substance and they keep you totally out of the loop unless you nag them for progress reports. Below is a timeline of the course of events, a few times not recorded by the trouble ticket system are `best guess`.
04:19 PM - Server becomes inaccessible / crashes
04:29 PM - I request a reboot from hostgator.com/reboot.php
04:34 PM - Reboot request acknowledged by HostGator
04:45 PM - Server comes back online
04:49 PM - Server becomes inaccessible / crashes for the 2nd time
04:55 PM - Reboot requested from hostgator.com/reboot.php for the 2nd time
04:56 PM - [Billy Harrison] replies to the original reboot request saying my site is online (it's not).
05:10 PM - Server still offline, Live Support contacted to assist with the matter, they tell me to open a trouble ticket.
05:15 PM - Trouble ticket opened detailing the above events, and requesting an investigation / resolution.
06:29 PM - [Matthew Feinberg] replies to the ticket saying he's issued a reboot request.
07:05 PM - Server is still offline. I reply to the ticket requesting an update.
07:14 PM - [David Ibarra] replies saying he rebooted again, and my server's load was high on reboot.
07:25 PM - [David Ibarra] replies saying that the server was undergoing a small DDOS attack and that it was routed and the server's back online.
07:54 PM - Server becomes inaccessible / crashes for the 3rd time.
08:00 PM - I reply to the ticket saying it's down again, and since it was a "ddos" not much could be done to prevent it but I would appreciate their assistance in bringing it back up.
09:32 PM - [David Ibarra] replies the DDOS might have been a coincidence (aka: he was wrong and we just lost 2 hours), and that he now suspects hard disk failure, he wants to schedule a hard disk scan at "midnight".
- At this point it's been over 5 hours of running in circles trying to reboot the server over and over, and he now finally suspects a hardware issue and wants to wait 3 hours to do a scan. -
10:03 PM - I acknowledge the scan and request to be updated when it starts.
10:45 PM - [David Ibarra] informs me he's scheduled the scan and that I'll be kept up to date.
01:01 AM - It's been over an hour since the scan was supposed to start and there wasn't an update, so I replied requesting one.
01:35 AM - [Nate Custer] replies saying that their timezone is CST (so the scan actually took 4 hours to start). Since there were no time zones listed by David, I assumed he meant EST, the timezone followed by the ticket system. Fair enough though.
04:04 AM - I reply requesting an update as it's been 2 hours 30 minutes since the last one.
04:34 AM - [Shashank Wagh] replies saying that it should be done any time now.
05:31 AM - "any time now" has come and passed, I reply requesting an update again.
06:39 AM - [Shashank Wagh] replies a couple minutes after I go back into Live Chat to ask what's going on. He says my diagnostics scan is now testing the hard drives.
08:11 AM - There are no updates to the ticket. I reply requesting an update yet again.
08:30 AM - I decide to take a 3 hour nap here because ive been awake ever since the issue started, trying to get this server back up
08:50 AM - My server is online
08:56 AM - [Matthew Lee] reports to the ticket that it's online.
12:14 PM - I respond to the ticket acknowledging that the server is online. I ask for specifics on the issue, and what HostGator plans to do to keep my business after the 16 hour outage (many companies offer a monthly credit or a prorated discount on next month's fee to compensate for extended outages).
12:28 PM - [David Ibarra] responds with the following:
"There were some hardware issues that were corrected on your server. However even though the server is a managed server, it's your responsibility to keep it running; we don't monitor dedicated servers since the server belongs to you. We will fix the server if you need help or if there's a hardware problem but hardware issues happen in any hosting environment."
12:44 PM - At this point I'm absolutely livid. Not only did he dance around the extremely important question of what exactly went wrong - he had the audacity to try and put the responsibility of this outage on my shoulders. I responded in kind,
"David,
Don't give me that. It's your responsibility to identify the problem as a hardware issue after it's reported to you, and take prompt action to resolve it as my dedicated hosting provider that lives hundreds of miles away. Since I can't physically walk to the server and run tests it's obviously impossible for me to say "Oh yes, this is a hard disk failure".
I came here with the problem, and it was after about 5 hours of tail chasing doing "reboot, reboot, reboot" that you realized it might be hardware. A downtime of 15+ hours for a hardware issue is a really poor level of service, telling me "it happens", is highly unprofessional and ignorant. Your team acted uninterested in my concerns and kept me so far out of the loop that I had to nag all afternoon, evening and night long to get updates. At the end of the day all I get is "Oh there were some hardware issues that were corrected". Throw me a bone man, I wanted to know exactly what was wrong with the server and what was done to remedy it so I have confidence that something was actually done and not a hardware diagnostics that came back clean, and the guy just plugged my server back in."
His initial reply was 14 minutes after mine, and it's now been around 4 hours since my last reply and they haven't touched on it yet so I'm done with it, and I'm guessing they are too. I know things like this are common in the hosting industry. I'm just 1 man in with a couple big websites that were down for almost an entire day and didn't even get an acceptable reason why. I hope this log helps you, reader.
