Web Hosting Talk







View Full Version : Dynadot Problems


RossH
07-14-2006, 01:30 AM
Dear Ross ******,

Last night the company that colocates our servers had a hardware failure. Thousands of websites were affected, included ours unfortunately. Here is what was affected:


Dynadot Website (offline from 2am - 10am PST)
- no orders could be placed
- domains could not be modified

Dynadot Forwarding & DNS (offline from 2am - 10am PST)
- forwarding was not working
- stealth was not working
- Dynadot DNS was not working
- parking was not working
(domains with third party name servers were not affected)

Dynadot Webhosting (offline from 2am - 2pm PST)
- offline, no webpages displayed
- no email received (most servers will re-deliver later)


We are very sorry for the inconvenience. I assure you that we spare no expense when it comes to your domains and hosting. Our servers are located in a state of the art data center (Market Post Tower in San Jose). Our colocation company had almost perfect uptime until now.

We are working hard to make sure this does not happen again.

Best Regards,
Dynadot Staff
(account ross******)

Did anyone else get this, this worries me heavily as a customer (luckily i have very few domains there)

Stan Marsh
07-14-2006, 02:22 AM
Ross, I think you're involved in hosting business for too long to know that this could happen with ANY company. A *HUGE* plus for Dynadot for admitting the problem; some companies even after that kind of failure would tell you that they had no failures...

laydee
07-14-2006, 03:15 AM
I got that last night but my domain is working fine now :)

RossH
07-14-2006, 07:18 AM
Ross, I think you're involved in hosting business for too long to know that this could happen with ANY company. A *HUGE* plus for Dynadot for admitting the problem; some companies even after that kind of failure would tell you that they had no failures...

I've been in it so long that I know hosts that have proper backups/planning don't have 8-12 hour downtimes..... :)

dynadot
07-14-2006, 01:51 PM
I've been in it so long that I know hosts that have proper backups/planning don't have 8-12 hour downtimes..... :)

We are very disappointed that we had such a long outage last night. We are very sorry to all our customers. In the 4 years we have been in business, our servers have been online almost continuously. We have duplicates of all our hardware ready in case of hardware failure. We raid all our drives and do daily backups to protect our data. But when something happens upstream of us to our colo provider there is not much we can do. Until now our colo provider had almost perfect uptime.

Here are the gory details of what happened. This email was from our colo provider:

===============================================

Well, we had quite a wild ride last night.

Unfortunately, it wasn't the type of ride we paricularly enjoyed.

At 1:16AM PST, our link to AboveNet crashed. (We also have links to Peer1 and InfoRelay, which did not go down and have remained online this entire time.)

After examining our switch, router, and fiber port that were linked to AboveNet, we could not find anything wrong, so we called AboveNet's 24x7 NOC.

The next 11 1/2 hours were mainly a blur to our weary support team, but we will sum it up quickly as follows:

* We had a complete "carbon copy" of our networking equipment as spares. We plugged it all in. It did not work.

* AboveNet insisted it was our fiber module (also known as a SFP) that had died. We spent hours on the phone with Cisco TAC and finally sourced a new one for $2200. (Mind you, we had two at the datacenter that did not work, but AboveNet had convinced us that that was the problem.) We received the new SFP via express courier at 8:30AM. We plugged it in. It did not work.

* More hours of phone support with AboveNet, who reset our switch configuration (which was working just fine before the link dropped -- we had changed nothing on our end) and tried many hacks on our end to get the link to work. No go. We also had a Cisco-certified tech called in to look at our configuration. He said it was fine.

* At 11:30AM, we received a phone call from AboveNet that they were sending out a tech to investigate in person. At around 12:15PM, he arrived. He examined our switch configuration and immediately asssessed that our end was fine. He left and went to AboveNet's SJC2 datacenter (where the other end of our fiber terminates.)

* At 12:45PM, he called and said that our link to AboveNet had been plugged in to a switch port on AboveNet's end that had died. He changed us to another switch port (on a different switch) and we were online at 12:57PM PST.

Essentially, it took AboveNet 11+ hours to figure out the problem was on their end after all, was not our fault and, in fact, had nothing to do with our specific configuration -- it was just a bad port on their end.

We will be making quick, firm, and decisive changes to our infrastructure to alleviate this problem in the future:

* We will be re-IPing, as we have mentioned earlier. All of your IP addresses will change. This is to get us out of AboveNet's grip and get us OFF of their network as quickly as possible. We will be expediting this IP address change and will do it sooner than originally planned to alleviate further unplanned downtime. You will see more emails about this by the end of this month.

* Once we re-IP, if ANY carrier (AboveNet/Peer1/InfoRelay) goes down, our network connection remains up, since our IP addresses are no longer tied to our upstream provider. In a matter of a few seconds, if AboveNet goes down, your connection will AUTOMATICALLY fail over to our other providers. This is why re-IPing is so critical.

* We will be pursuing serious financial compensation and damages from AboveNet. We may pursue legal action if it is determined that it is necessary in order to have damages awarded.

* As soon as all of our customers have re-IPed, we will be dropping AboveNet as a carrier due to their negligence.

Make no mistake. This has cost us dearly in terms of reputation, time, money, and customers. While I can't say I blame you for being angry, as I too am angry, please do communicate with us and we will do our best to help you in whatever way we can. Assuming AboveNet does decide to award us damages, we will be in turn sending this compensation on to our customers. If AboveNet does not award us damages, or a drawn-out court battle ensues, we will do our absolute best (within our power) to help you. Since this outage was not directly our fault, we will need to see how things shake out with AboveNet before announcing any particular financial compensation. While I cannot promise you that we will be able to compensate you for this downtime, I can promise you that we will be fighting with every breath we have to make sure that this is the last outage like this we see, and that we will prove once again our willingness to go the extra mile with our customer service and make sure that you are satisfied with our service.

Will this happen again? After we re-IP, no, it won't. That's why re-IPing will become our top priority over the next 30 days. You have my promise -- my word -- that we will do everything in our power to prevent any outages like this from happening again.

RossH
07-14-2006, 01:59 PM
Thanks you for the very detailed response. I personally am not angry but just worried with your long outage but now I understand why. It really amazes me that it took 11+ hours to figure out a port on a switch went bad....

Why is your colo provider single homed?

dynadot
07-14-2006, 05:26 PM
Thanks you for the very detailed response. I personally am not angry but just worried with your long outage but now I understand why. It really amazes me that it took 11+ hours to figure out a port on a switch went bad....

Why is your colo provider single homed?

They are in the process of changing to an unhomed ip allocation. They are a young company like us, and maybe they made a bad decision early on. But their service overall is excellent.

avythe
07-14-2006, 05:36 PM
I got the email too but I didn't notice any issues. I have one (maybe two, can't remember :p) very important domains there and that's all I keep there, so I don't need to login very often. Props to them for taking care of it so well though.

Dynadot has been great :)