Results 1 to 16 of 16
-
06-22-2006, 11:15 AM #1Newbie
- Join Date
- Jun 2006
- Location
- Montreal, Canada
- Posts
- 5
Report on viability of DNS failover solution
I run a site with about 1,000,000 unique visitors per month and recents server failures made me decide to get a failover server to minimize downtime. My goal wasn't to get 99.999% uptime but to be able to be back on track after a failure in a "reasonable" amount of time. After evaluating several solutions, I decided to go with DNS failover. Here's how the setup work:
1) mydomain.com points to main server with a very low TTL (time to live)
2) failover server replicates data from main server
3) when main server goes down, mydomain.com is changed to point to failover server
The drawback is the DNS propagation time since some DNS servers don't honor the TTL and there is some caching happening on the user's machine and browser. I looked for empirical data to gauge the extent of the problem but couldn't find any so I decided to setup my own experiment.
The Experiment
==============
I start with mydomain.com pointing to the main server with a TTL of 1800 seconds (1/2 hour). I then change it to point to the failover server which simply port forwards to the main server. On the main server, I periodically compute the percentage of requests coming from the failover server which gives me the percentage of people for which the DNS change has propagated.
I made the DNS change at exactly 16:04 on 06/21/06 and here are the percentage of propagated users:
06/21/06 16:00 0 %
06/21/06 16:05 3 %
06/21/06 16:10 20 %
06/21/06 16:15 37 %
06/21/06 16:20 59 %
06/21/06 16:25 69 %
06/21/06 16:30 76 %
06/21/06 16:35 80 %
06/21/06 16:40 86 %
06/21/06 16:45 90 %
06/21/06 16:50 91 %
06/21/06 16:55 92 %
06/21/06 17:00 93 %
06/21/06 17:05 94 %
06/21/06 17:10 94 %
06/21/06 17:15 95 %
06/21/06 17:35 95 %
06/21/06 17:40 96 %
06/21/06 17:45 97 %
...
06/22/06 10:40 99 %
So even after 18 hours, there is still a certain percentage of users going to the old server so DNS failover is obviously not a 99.999% uptime solution. However, since more than 90% of the users are propagated in the first hour, the solution works well enough for me.
Regards
Jean-Philippe Bouchard
-
06-22-2006, 11:27 AM #2Junior Guru Wannabe
- Join Date
- Jun 2006
- Posts
- 67
You are hosting your pwn DNS servers?
-
06-22-2006, 11:39 AM #3Newbie
- Join Date
- Jun 2006
- Location
- Montreal, Canada
- Posts
- 5
Originally Posted by siliconcowboy73
-
06-22-2006, 11:58 AM #4Junior Guru Wannabe
- Join Date
- Jun 2006
- Posts
- 67
Originally Posted by jeanphil
Ok. What if you were to jsut setup phonydomain.com and host it with a 99.999% uptime company like Network Solutions or Register.com. Then have phonydomain.com redirect to your IP.
The world should cache phonydomain.com. Since it will "never" go down just have phonydomain.com be the traffic redirector. You could login to Network Solutions/Register.com and toggle it between your two servers. Would that work?
-
06-22-2006, 12:06 PM #5Newbie
- Join Date
- Jun 2006
- Location
- Montreal, Canada
- Posts
- 5
Originally Posted by siliconcowboy73
-
06-22-2006, 12:21 PM #6Junior Guru Wannabe
- Join Date
- Jun 2006
- Posts
- 67
Originally Posted by jeanphil
-
06-22-2006, 04:26 PM #7Newbie
- Join Date
- Jun 2006
- Posts
- 17
..yeah... assuming bgp routing + bgp anycasting.... not 100% but very, VERY close to it I guess
-
06-23-2006, 06:04 AM #8Web Hosting Guru
- Join Date
- May 2004
- Posts
- 305
Jean-Philippe, many thanks for your report. Your research is much appreciated and addresses one of the questions I had about the DNSMadeEasy service. 93% of users found your backup server within an hour, sounds great.
I've been investigating this subject recently as well, and hope to duplicate your success.
The missing piece for me is learning how to create the mirror server and keep the data up to date.
I'd be grateful for any remarks you may care to share about your procedure there. Thanks again.
-
06-23-2006, 02:17 PM #9Newbie
- Join Date
- Jun 2006
- Location
- Montreal, Canada
- Posts
- 5
Originally Posted by squirreldog
The 2 servers are running debian 3.1 with mysql 4.1 as the backend.
The mysql is replicated as described in this article (http://www.onlamp.com/pub/a/onlamp/2005/06/16/MySQLian.html). Moreover, the replication takes place over a virtual private network. I use openvpn (http://openvpn.net/), configured with static key as described here (http://openvpn.net/static.html).
The web data is replicated using rsync (http://samba.anu.edu.au/rsync/) using ssh as the transport.
I don't use the automatic failover feature of DNS made easy but I do use their server monitoring feature. When the main server is down, I receive an SMS message and set the IP to the failover server.
If you need more details, let me know.
-
06-23-2006, 04:20 PM #10WHT Addict
- Join Date
- Aug 2005
- Posts
- 126
I'd be very curious how the response is with much short TTL values. Your data seems to show that 80% of your users got routed even though they arrived within the TTL of 30 minutes. That's pretty good. It seems to imply that 80% of users had not visited within recent time and so their first hit was not cached. The other 20% may have been on and then "lost you" and had to wait some time to get the right ip again. For those users it looks like downtime, so it's not so good.
But what if you set it up more like a dynamic dns where the dns gets updated with the ip whenever it changes and in those cases TTL is shorter. I wonder how much you could reduce that downtime for users who were on at the time of failure?
-
06-23-2006, 04:42 PM #11Newbie
- Join Date
- Jun 2006
- Location
- Montreal, Canada
- Posts
- 5
Originally Posted by csavery
Also, very short TTL results in more hit on the DNS servers, which means more bandwidth cost or, in my case, a more expensive DNS made easy package (they charge on a per request basis).
However, I may do another test with a TTL of 5 minutes, just to see if that theory of too short a TTL holds.
Originally Posted by csavery
Originally Posted by csavery
Originally Posted by csavery
-
06-15-2008, 11:07 AM #12New Member
- Join Date
- May 2008
- Posts
- 2
nice topic, have been thinking about this to, altough there is still a considerable downtime..
looking forward to hear from you when you have completed the test with the TTL set to 5 mins
-
06-15-2008, 08:39 PM #13Web Monkey
- Join Date
- Dec 2005
- Location
- Finland
- Posts
- 1,471
Thanks for posting. This is interresting. The results are much better than I expected.
-
06-16-2008, 02:34 PM #14Away
- Join Date
- Jun 2002
- Posts
- 5,278
Depending on what you are hosting why not just use a CDN that proxies your website.....
-
08-20-2008, 07:17 AM #15******* Unleaded
- Join Date
- Feb 2004
- Posts
- 3,849
I first ran into this data last year on jeanphil's blog and found it useful as a reference.
Earlier today, I ran into some interesting data to add. Quite by accident.
A zone that was running about 100K queries per day was pointed at another NS. The TTL's have always been very short, less than 300 seconds.
For the next 30 odd days, until the zone was repointed at the original NS, approximately 50+ queries were arriving at the NS. This caused no problems because the NS still had the zone data.
In comparison to the usual traffic, clearly 50+ is infinitely small, approaching zero. But, it is a number that showed no sign of declining in the 30 days.
So, while clearly there are some small number of clients or caches out there that are not respecting the TTL, it is so small as to be insignificant. However, those few are very stubborn about sticking to the original NS for reasons unknown.
Trivia? Maybe, but it at least you have a clear set of numbers to draw your own conclusions from.edgedirector.com
managed dns global failover and load balance (gslb)
exactstate.com
uptime report for webhostingtalk.com
-
04-27-2010, 12:48 AM #16New Member
- Join Date
- Apr 2010
- Location
- Virginia
- Posts
- 1
a website PHP Host Failover Script
I have a slightly different solution.
I want my clients’ sites to be up all the time, but I don’t want to mess around with DNS, its too much out of my control.
I *DO* like the idea of using a 99.99999% uptime to place a pointer, only, I want that pointer to change dynamically. I've been able to do this by using no-ip.com and forwarding my @ dns record to a dynamically allocated no-ip.com domain. However, this was only part of my failover solution.
I have written a PHP script that could be placed on a “free webhost”, such as the free credit webhost with a godaddy domain name, that will test a website through a cURL function and when successful then proceed to forward the visitor to that site, or otherwise forward the visitor to second, third or more hosts, depending on which one responds.. simply going down the list until one is found active. You could even have the page reload until a successful connection is made.
Obviously, this will do nothing for when godaddy goes down, but it will allow me more control over what happens when my own servers go down.
This first version will check each time to make sure the host is active before forwarding the visitor to the verified host.
I’d be interested in hearing suggestions for features.
Its available for download free from my website, I hope some of you find it useful!
You can download it from my site, this bulletin board won't let me post a link..