Results 1 to 22 of 22

Thread: DNS Failover?

  1. #1
    Join Date
    Sep 2003
    Posts
    1,305

    Thumbs up DNS Failover?

    I'm working on a product that will require impressive uptime.

    I'm looking for creating several static servers to relay content, but I will obviously need to have them setup with failover. Round Robin isn't an option.

    I heard systems like zoneedit let you put a list of IP's to be accessed from, but ten minutes for detection seems kind of steep? Anybody know of any other services that offer this system? How does it work, and would it make sense to run your own?

  2. #2
    Join Date
    Sep 2002
    Location
    Top Secret
    Posts
    11,686
    Doing this is pretty simple really. You don't need any "additional" software or services.

    Firstly, you need redundancy for your DNS servers.
    Secondly, you need to copy the sites from server A to server B (C, D, E, etc).
    Thirdly, you need multiple entries for things, such as:

    www 14400 IN A ip2
    www 14400 IN A ip1
    keep on going just like this
    This works better when you're hosting your servers in different parts of the world (or country), but it also works as a failover method, PROVIDING, of course that you've got everything sync'ed up.

    Corporations like Microsoft, google, ebay, paypal, numerous others (including myself, now) are doing this. The only problem you'll end up having is when you get into dynamic content like sql. Then, you need to find a centralized server and feed off of that just for sql data.

    HTH and good luck!
    <edit>
    Sorry, forgot to add one thing.
    In order for this to work efficiently, you should have at least as many mx records as you do servers, with the appropriate priorities. For example, mine are :
    linux-tech.net. 14400 IN MX 0 mail.linux-tech.net. (primary)
    linux-tech.net. 14400 IN MX 10 jersey.linux-tech.net.
    (secondary)
    And WITH these mx servers, you need to have identical entries for the mail accounts, in order to prevent problems.
    </edit>
    Last edited by whmcsguru; 07-21-2005 at 04:48 PM.
    WHMCS Guru - WHMCS addons, management, support and more.
    WHMCS Notifications Extended - Add slack, hipchat, SMS, pushover to WHMCS !!
    Always looking for Linux, WHMCS, Support Desk work. PM for details

  3. #3
    Join Date
    Aug 2004
    Location
    South Daytona, FL
    Posts
    2,476
    Are you going to make the servers geographically diverse as well? If they are all going to be in the same facility a load balancing "appliance" is an easy solution. I've used http://www.coyotepoint.com/e350.htm for a large intranet.
    "Arms discourage and keep the invader and plunderer in awe, and preserve order in the world as well as property... Horrid mischief would ensue were the law-abiding deprived of the use of them." - Thomas Paine

  4. #4
    Join Date
    Sep 2003
    Posts
    1,305
    Oh no, diverse locations.


    Linux Tech : Does your company do this, or do you know any other company that does this? Preferably I want everything in house =)

  5. #5
    Join Date
    Sep 2002
    Location
    Top Secret
    Posts
    11,686
    AEA:
    Yes, I do this now, as a matter of fact. I started a couple of weeks ago, due to some issues with one DC. It's not really that complicated to do, and since I had the extra servers @ hand, I figured it'd help out a bit.
    WHMCS Guru - WHMCS addons, management, support and more.
    WHMCS Notifications Extended - Add slack, hipchat, SMS, pushover to WHMCS !!
    Always looking for Linux, WHMCS, Support Desk work. PM for details

  6. #6
    Join Date
    Apr 2004
    Location
    San Jose
    Posts
    902
    As I see it, linux-tech just described DNS round robin. Also, his domain seems to be implementing round robin. I can't tell if he's doing anything behind the scenes to improve things.

    The problem with that is with a long TTL, when one of the servers dies, the IP of the dead server is still going to be out there for all to see. Most applications, including web browsers, only use the first IP returned by an A record lookup, so you're going to see failures for that portion of the internet that sees the dead server's IP first.

    When you do DNS round robin, a "randomly" shuffled list of your IPs is presented when the A record is looked up. Web browsers are only going to use the first IP, and if that happens to be the down one, get a failure to connect, and for those users, your site appears to be down. They do not continue to the next IP if the first one fails, as some telnet implementations do. If they did, things would be much simpler.

    DNS round robin is good for distributing load, but it doesn't help with failures, by itself.

    One technique is to put one authoritative DNS server on each of your redundant servers, only advertising itself for the A record, with a 1 second TTL.

    Then, if that machine goes down, DNS requests will find one of your up servers, which will tell the browser only about itself.

    You will still get a 1 second outage when a machine goes down for all end users that are going through properly acting DNS caching servers, but for those that cache longer than they are supposed to, a portion of them will still see a failure.

    The downside to this technique is that during normal operations there is more latency to get to your site because the DNS can't be cached for very long.

    Doing it right involves BGP and your own IP space, which isn't feasible for a small company. The company I work at spends $xxx,xxx per month on bandwidth and rents space in datacenters in California, Chicago, London, and Amsterdam to get network reliabilty. When our main networks in California and London go out, we have monitoring scripts that change our BGP information to switch to Chicago and Amsterdam. With an operations team of 20 and all the resources we have, we still have reduced functionality when the main network is down due to the difficulty of two way data replication.

    This is not an easy problem to solve.

    For a reasonable price, you can limit your outages.

    The first thing you want to look at is the reliability of the networks you rent from. The second thing is to set up a pair of machines that can take over from each other on the same subnet. Here's an article that talks about that: http://hacks.oreilly.com/pub/h/79

    This will limit your outages to the DC and network, not your machines. Combining this with DNS round robin will limit your outages to a portion of your end users.

  7. #7
    Join Date
    Sep 2003
    Posts
    1,305
    Okay, round robin isn't what I really want. I have servers at Ashburn, Virginia; they're pretty reliable, but there are other factors in the mix as well.

  8. #8
    Join Date
    Mar 2002
    Location
    UK
    Posts
    458
    Check out SimpleFailover at http://www.simplefailover.com/ which seems to do what you want.

    Round robin DNS does not provide failover.
    Chris at TDMWeb.com
    Windows & Linux hosting and fully managed dedicated servers with great customer service!
    UK-based but serving the world...

  9. #9
    Join Date
    Apr 2004
    Location
    San Jose
    Posts
    902
    The simplefailover system will have the TTL issue, only runs on Windows, and presumably costs money.

    Running a DNS server on each machine pointing to itself doesn't cost you anything extra, and gives you just as much protection.

  10. #10
    what's about use external central DNS such as dnsmadeeasy.com ? they have a failover product that I will to test shortly.
    anybody have experiences about ?

    thanks
    Martin

  11. #11
    Join Date
    Sep 2003
    Posts
    1,305
    Originally posted by sailorFred
    The simplefailover system will have the TTL issue, only runs on Windows, and presumably costs money.

    Running a DNS server on each machine pointing to itself doesn't cost you anything extra, and gives you just as much protection.

    How would that give you failover though?

  12. #12
    Join Date
    Apr 2004
    Location
    San Jose
    Posts
    902
    When the DNS entries have expired out of the cache, (such as when the Time To Live [TTL]) is 1 second, the authoritative DNS servers for the domain, as specified by whois, are queried for the IP addresses of the domain name.

    Say you have two redundant machines, each of which is running an authoritative DNS server, pointing your domain to itself. Let's say a.domain.com is at 192.168.0.101, and b.domain.com is at 192.168.0.102, and that's what your whois record states.

    If querying 192.168.0.101 for www.domain.com points only to 192.168.0.101, and 192.168.0.102 points only to 192.168.0.102, then when the DNS query is made, it goes to both DNS servers, and whichever answers first wins. If one is down, then that one won't answer. The up one will answer, and get the traffic.

    You start out with a balanced load between your two servers, and if one goes down, as soon as the DNS caches expire, your remaining server gets all the traffic.

    The practical issue with all this is that Netscape/Mozilla/Firefox at least, cache the DNS entry, ignoring the TTL. So if someone has visited your site recently, and the machine it was talking to goes down, they won't be able to access it again until either they restart Firefox, or it finally decides to purge that DNS entry.

    That's why I was suggesting using clustered servers, so that if the network is still up, your backup server will get the traffic for that IP.

  13. #13
    Join Date
    Dec 2004
    Location
    New York City, NY, USA
    Posts
    735
    Just wondering, what is wrong with round-robin DNS?

  14. #14
    Join Date
    Apr 2004
    Location
    San Jose
    Posts
    902
    No effective failover capability. Part of your userbase will see you as down when one of your machines is down or unreachable.

  15. #15
    Join Date
    Apr 2001
    Location
    Denmark, Europe
    Posts
    614
    Hi guys,

    As I understood Aea, he needs to setup fail-over between a set of servers at one location. We have done this many times in the past for several clients using a setup similar to the one I'll describe here:

    First off, you start by acquiring two internet connections. Preferably with two different providers, but if that can't be managed - then just two connections from the same ISP. If you have two different providers, a setup with BGP and PI addresses is the easiest to manage.

    Then you want each internet connection to be handled by a seperate router on your side. This can either be regular routers like the Cisco 7200/7300 series (if you want to run BGP), or a routing switch such as the Cisco 3750G. The latter also provides BGP if you buy the EMI model, but you can't run full BGP table with two providers from it. On the other hand, it handles a lot of bandwidth without problems. On the routers you run a protocol such as HSRP or VRRP to get one IP address which is shared between the two routers.

    Then you setup two load-balancers. We normally use "self-built" load-balancers based on for example an IBM x336 1U server with Linux. If you have to deal with large amounts of bandwidth, you can set this up so that only incoming traffic goes through the load-balancer. I.e. each web server sends it output directly to the routers.

    Load balancer 1 has a direct connection to router 1, and similarly load-balancer 2 has a direct connection to router 2. The load-balancer uses the shared IP of the two routers as a default gateway. Then the two load balancers also run VRRP together (or perhaps CARP if you're running OpenBSD or FreeBSD). This means that you will have a second shared IP -- this IP is then the IP that you advertise in DNS for your sites.

    Then behind the load-balancers you have the web servers, mail servers, database servers, etc. These can be connected to the load-balancers in different ways. For example you can have a switch which load-balancer 1 connects to, that then has a connection to every web server. And similarly, you have a different switch that load-balancer 2 connects to, which also has a connection to every web-server. I.e. each web server is connected to two different switches.

    This way you can take a server down for maintenance without any downtime. A server can fail which will only affect the connections currently established to that server (connections to the other servers and new connections will not be affected). Similarly, a loss of a uplink, router, switch or load-balancer does not affect things.

    You might want to consider having the link from each load-balancer to the switch with web servers being a fibre optic link. Similarly the link between the two routers could be fibre optic. If you then simply have two sets of web-servers (instead of having each web-server connect to both load-balancers) then you can have to systems that are galvanically seperated. They could be put in two different rooms, etc. You obviously have to take care that the electricals are done correctly.

    In addition to this, you can look at improving the stability of each server. For example with the IBM x336 server you can have two, hot-swap PSUs, hot-swap RAID, hot-swap fans, etc.

    I haven't even scratched the surface here. If you want to run anything more exciting than a static web site, you'll have to start looking at clustering databases, file servers, etc. to ensure that you haven't got any single point of failure there.

  16. #16
    Join Date
    Apr 2004
    Location
    Singapore
    Posts
    617
    www 14400 IN A ip1
    www 14400 IN A ip2
    keep on going just like this

    I realise that sometime you will see ip2 and the next time, you will see ip1.

    is there a way to had the setup to stick to ip1 and if the ip1 server went down, ip2 will be use.
    Linux System admin (since 2001)
    * cPanel/WHM, Directadmin, Apache, DNS, PHP, HyperVM, Lxadmin, Openvz*

  17. #17
    Join Date
    Apr 2004
    Location
    San Jose
    Posts
    902
    See my post from 7/22. It's not perfect, though.

  18. #18
    Join Date
    Apr 2004
    Location
    Singapore
    Posts
    617
    Hi Fred,

    Thanks for your reply.

    With your suggestion, did i setup correctly now?
    This will user will see ip1 for 99% of the time until it went down.


    www 1 IN A ip1
    www 14400 IN A ip2
    Linux System admin (since 2001)
    * cPanel/WHM, Directadmin, Apache, DNS, PHP, HyperVM, Lxadmin, Openvz*

  19. #19
    Join Date
    Mar 2004
    Location
    London, UK
    Posts
    285
    In regards to dynamic sites, is clustering servers in different DC's a viable solution?

    Thanks,

    - Vince

  20. #20
    Join Date
    Sep 2003
    Posts
    1,305
    Just to clarify.

    It will be serveral machines in different locations, some will be in the same DC, others will not.

    Having a configuration that requires so much involvement of the DC won't really work.

    I want work load distributed between the servers, but when one of them goes down for that server to be taken out of the list and not be accessable until it's up.

  21. #21
    Originally posted by jayzee
    Hi Fred,

    Thanks for your reply.

    With your suggestion, did i setup correctly now?
    This will user will see ip1 for 99% of the time until it went down.

    www 1 IN A ip1
    www 14400 IN A ip2

    That will not work at all. The TTLs will be the same once entered (if your name server is RFC compliant) or when it is read (most resolvers will fix this or return an error).
    All A records with the same name must have the same TTL.

  22. #22
    Join Date
    Mar 2002
    Location
    UK
    Posts
    458
    Aea: If you have machines in different DCs then a software-based solution, like SimpleFailover, is in my view your best bet. Combine this with round-robin DNS to give you load balancing.

    The TTL issues are not significant in my view: if they are then you need a hardware solution (= $$$). You could probably find a third-party to provide the failover and DNS service for you. Or you could code it yourself on a Linux box (the concepts are quite simple).
    Chris at TDMWeb.com
    Windows & Linux hosting and fully managed dedicated servers with great customer service!
    UK-based but serving the world...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •