hosted by liquidweb


Go Back   Web Hosting Talk : Web Hosting Main Forums : Web Hosting : Report on viability of DNS failover solution
Reply

Forum Jump

Report on viability of DNS failover solution

Reply Post New Thread In Web Hosting Subscription
 
Send news tip View All Posts Thread Tools Search this Thread Display Modes
  #1  
Old 06-22-2006, 11:15 AM
jeanphil jeanphil is offline
Newbie
 
Join Date: Jun 2006
Location: Montreal, Canada
Posts: 5

Report on viability of DNS failover solution


I run a site with about 1,000,000 unique visitors per month and recents server failures made me decide to get a failover server to minimize downtime. My goal wasn't to get 99.999% uptime but to be able to be back on track after a failure in a "reasonable" amount of time. After evaluating several solutions, I decided to go with DNS failover. Here's how the setup work:

1) mydomain.com points to main server with a very low TTL (time to live)
2) failover server replicates data from main server
3) when main server goes down, mydomain.com is changed to point to failover server

The drawback is the DNS propagation time since some DNS servers don't honor the TTL and there is some caching happening on the user's machine and browser. I looked for empirical data to gauge the extent of the problem but couldn't find any so I decided to setup my own experiment.

The Experiment
==============

I start with mydomain.com pointing to the main server with a TTL of 1800 seconds (1/2 hour). I then change it to point to the failover server which simply port forwards to the main server. On the main server, I periodically compute the percentage of requests coming from the failover server which gives me the percentage of people for which the DNS change has propagated.

I made the DNS change at exactly 16:04 on 06/21/06 and here are the percentage of propagated users:

06/21/06 16:00 0 %
06/21/06 16:05 3 %
06/21/06 16:10 20 %
06/21/06 16:15 37 %
06/21/06 16:20 59 %
06/21/06 16:25 69 %
06/21/06 16:30 76 %
06/21/06 16:35 80 %
06/21/06 16:40 86 %
06/21/06 16:45 90 %
06/21/06 16:50 91 %
06/21/06 16:55 92 %
06/21/06 17:00 93 %
06/21/06 17:05 94 %
06/21/06 17:10 94 %
06/21/06 17:15 95 %
06/21/06 17:35 95 %
06/21/06 17:40 96 %
06/21/06 17:45 97 %
...
06/22/06 10:40 99 %

So even after 18 hours, there is still a certain percentage of users going to the old server so DNS failover is obviously not a 99.999% uptime solution. However, since more than 90% of the users are propagated in the first hour, the solution works well enough for me.

Regards
Jean-Philippe Bouchard



Sponsored Links
  #2  
Old 06-22-2006, 11:27 AM
siliconcowboy73 siliconcowboy73 is offline
Junior Guru Wannabe
 
Join Date: Jun 2006
Posts: 67
You are hosting your pwn DNS servers?

__________________
My shameless plug here:
http://www.FishingConnection.net
Let's go FISHING!

  #3  
Old 06-22-2006, 11:39 AM
jeanphil jeanphil is offline
Newbie
 
Join Date: Jun 2006
Location: Montreal, Canada
Posts: 5
Quote:
Originally Posted by siliconcowboy73
You are hosting your pwn DNS servers?
No, I'm using DNS made easy. Also my main server is hosted at layered technologies and my failover server is at 1&1.

Sponsored Links
  #4  
Old 06-22-2006, 11:58 AM
siliconcowboy73 siliconcowboy73 is offline
Junior Guru Wannabe
 
Join Date: Jun 2006
Posts: 67
Quote:
Originally Posted by jeanphil
No, I'm using DNS made easy. Also my main server is hosted at layered technologies and my failover server is at 1&1.

Ok. What if you were to jsut setup phonydomain.com and host it with a 99.999% uptime company like Network Solutions or Register.com. Then have phonydomain.com redirect to your IP.

The world should cache phonydomain.com. Since it will "never" go down just have phonydomain.com be the traffic redirector. You could login to Network Solutions/Register.com and toggle it between your two servers. Would that work?

__________________
My shameless plug here:
http://www.FishingConnection.net
Let's go FISHING!

  #5  
Old 06-22-2006, 12:06 PM
jeanphil jeanphil is offline
Newbie
 
Join Date: Jun 2006
Location: Montreal, Canada
Posts: 5
Quote:
Originally Posted by siliconcowboy73
Ok. What if you were to jsut setup phonydomain.com and host it with a 99.999% uptime company like Network Solutions or Register.com. Then have phonydomain.com redirect to your IP.

The world should cache phonydomain.com. Since it will "never" go down just have phonydomain.com be the traffic redirector. You could login to Network Solutions/Register.com and toggle it between your two servers. Would that work?
That's a good point. Actually, when I was doing my research, that's the first thing I looked for. Basically, a proxy provider that would allow me to point my domain to one of their IP and control where that IP goes. A "hosted" load balancer/router service if you will. However, I wasn't able to find any company offering that service so I decided to go with DNS failover.

  #6  
Old 06-22-2006, 12:21 PM
siliconcowboy73 siliconcowboy73 is offline
Junior Guru Wannabe
 
Join Date: Jun 2006
Posts: 67
Quote:
Originally Posted by jeanphil
That's a good point. Actually, when I was doing my research, that's the first thing I looked for. Basically, a proxy provider that would allow me to point my domain to one of their IP and control where that IP goes. A "hosted" load balancer/router service if you will. However, I wasn't able to find any company offering that service so I decided to go with DNS failover.
Try Akamai. I think they do that caching and redirect hosting stuff.

__________________
My shameless plug here:
http://www.FishingConnection.net
Let's go FISHING!

  #7  
Old 06-22-2006, 04:26 PM
MDHjeff MDHjeff is offline
Newbie
 
Join Date: Jun 2006
Posts: 17
..yeah... assuming bgp routing + bgp anycasting.... not 100% but very, VERY close to it I guess

  #8  
Old 06-23-2006, 06:04 AM
Nature-Talk Nature-Talk is offline
Web Hosting Guru
 
Join Date: May 2004
Posts: 300
Jean-Philippe, many thanks for your report. Your research is much appreciated and addresses one of the questions I had about the DNSMadeEasy service. 93% of users found your backup server within an hour, sounds great.

I've been investigating this subject recently as well, and hope to duplicate your success.

The missing piece for me is learning how to create the mirror server and keep the data up to date.

I'd be grateful for any remarks you may care to share about your procedure there. Thanks again.

  #9  
Old 06-23-2006, 02:17 PM
jeanphil jeanphil is offline
Newbie
 
Join Date: Jun 2006
Location: Montreal, Canada
Posts: 5
Quote:
Originally Posted by squirreldog
Jean-Philippe, many thanks for your report. Your research is much appreciated and addresses one of the questions I had about the DNSMadeEasy service. 93% of users found your backup server within an hour, sounds great.

I've been investigating this subject recently as well, and hope to duplicate your success.

The missing piece for me is learning how to create the mirror server and keep the data up to date.

I'd be grateful for any remarks you may care to share about your procedure there. Thanks again.
You're welcome!

The 2 servers are running debian 3.1 with mysql 4.1 as the backend.

The mysql is replicated as described in this article (http://www.onlamp.com/pub/a/onlamp/2005/06/16/MySQLian.html). Moreover, the replication takes place over a virtual private network. I use openvpn (http://openvpn.net/), configured with static key as described here (http://openvpn.net/static.html).

The web data is replicated using rsync (http://samba.anu.edu.au/rsync/) using ssh as the transport.

I don't use the automatic failover feature of DNS made easy but I do use their server monitoring feature. When the main server is down, I receive an SMS message and set the IP to the failover server.

If you need more details, let me know.

  #10  
Old 06-23-2006, 04:20 PM
csavery csavery is offline
WHT Addict
 
Join Date: Aug 2005
Posts: 126
I'd be very curious how the response is with much short TTL values. Your data seems to show that 80% of your users got routed even though they arrived within the TTL of 30 minutes. That's pretty good. It seems to imply that 80% of users had not visited within recent time and so their first hit was not cached. The other 20% may have been on and then "lost you" and had to wait some time to get the right ip again. For those users it looks like downtime, so it's not so good.

But what if you set it up more like a dynamic dns where the dns gets updated with the ip whenever it changes and in those cases TTL is shorter. I wonder how much you could reduce that downtime for users who were on at the time of failure?

  #11  
Old 06-23-2006, 04:42 PM
jeanphil jeanphil is offline
Newbie
 
Join Date: Jun 2006
Location: Montreal, Canada
Posts: 5
Quote:
Originally Posted by csavery
I'd be very curious how the response is with much short TTL values.
From what I read, some DNS servers will not honor very short TTL and will fallback to a default value. 1/2 hour seems to be the "optimal" value.

Also, very short TTL results in more hit on the DNS servers, which means more bandwidth cost or, in my case, a more expensive DNS made easy package (they charge on a per request basis).

However, I may do another test with a TTL of 5 minutes, just to see if that theory of too short a TTL holds.

Quote:
Originally Posted by csavery
Your data seems to show that 80% of your users got routed even though they arrived within the TTL of 30 minutes. That's pretty good. It seems to imply that 80% of users had not visited within recent time and so their first hit was not cached.
Not necessarily. When I switched the IP, DNS and/or users had the old IP in their cache with different expiration times. Some were due to refresh it 30 minutes from now, some 15 minutes, some 1 minute, etc. Also, based on the usage pattern of the site (length of visits, etc), I doubt that 80% of the visitors at any given time have been on the site for less than 30 minutes.

Quote:
Originally Posted by csavery
The other 20% may have been on and then "lost you" and had to wait some time to get the right ip again. For those users it looks like downtime, so it's not so good.
That's correct. It's a drawback of the solution. However, I'm willing to live with it since I have a major server crash once every few years.

Quote:
Originally Posted by csavery
But what if you set it up more like a dynamic dns where the dns gets updated with the ip whenever it changes and in those cases TTL is shorter. I wonder how much you could reduce that downtime for users who were on at the time of failure?
If I decide to run the experiment again with a shorter TTL, I'll definitely publish the results here.

  #12  
Old 06-15-2008, 11:07 AM
ihsol ihsol is offline
New Member
 
Join Date: May 2008
Posts: 2
nice topic, have been thinking about this to, altough there is still a considerable downtime..
looking forward to hear from you when you have completed the test with the TTL set to 5 mins

  #13  
Old 06-15-2008, 08:39 PM
nettiapina nettiapina is offline
Web Monkey
 
Join Date: Dec 2005
Location: Finland
Posts: 1,469
Thanks for posting. This is interresting. The results are much better than I expected.

  #14  
Old 06-16-2008, 02:34 PM
RossH RossH is offline
Away
 
Join Date: Jun 2002
Posts: 5,278
Depending on what you are hosting why not just use a CDN that proxies your website.....

  #15  
Old 08-20-2008, 07:17 AM
plumsauce plumsauce is offline
******* Unleaded
 
Join Date: Feb 2004
Posts: 3,802
I first ran into this data last year on jeanphil's blog and found it useful as a reference.

Earlier today, I ran into some interesting data to add. Quite by accident.

A zone that was running about 100K queries per day was pointed at another NS. The TTL's have always been very short, less than 300 seconds.

For the next 30 odd days, until the zone was repointed at the original NS, approximately 50+ queries were arriving at the NS. This caused no problems because the NS still had the zone data.

In comparison to the usual traffic, clearly 50+ is infinitely small, approaching zero. But, it is a number that showed no sign of declining in the 30 days.

So, while clearly there are some small number of clients or caches out there that are not respecting the TTL, it is so small as to be insignificant. However, those few are very stubborn about sticking to the original NS for reasons unknown.

Trivia? Maybe, but it at least you have a clear set of numbers to draw your own conclusions from.

__________________
edgedirector.com
managed dns global failover and load balance (gslb)
exactstate.com
uptime report for webhostingtalk.com

Reply

Related posts from TheWhir.com
Title Type Date Posted
Hybrid Cloud Growth Driven by Enterprise Deployment: MarketsandMarkets Report Web Hosting News 2014-03-12 16:43:38
NetShop Internet Services Ltd Listing 2014-04-16 23:30:33
Zerto Finds 7 Percent of Companies Don't Have Disaster Recovery Plan Web Hosting News 2013-07-30 16:30:45
Peak 10 Launches Cloud-Based SQL Database as a Service Solution Web Hosting News 2013-05-21 11:14:14
Web Host FireHost Expands Business Continuity Solutions Web Hosting News 2012-08-15 15:58:31


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes
Postbit Selector

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Forum Jump
Login:
Log in with your username and password
Username:
Password:



Forgot Password?
Advertisement:
Web Hosting News:



 

X

Welcome to WebHostingTalk.com

Create your username to jump into the discussion!

WebHostingTalk.com is the largest, most influentual web hosting community on the Internet. Join us by filling in the form below.


(4 digit year)

Already a member?