Web Hosting Talk







View Full Version : About BGP and multihomed - when they still go down...


panopticon
03-23-2002, 10:57 PM
I was reading the RackShack forums today, and I found this thread very interesting: http://forum.rackshack.net/showthread.php?s=&threadid=4472

When I looking at both tranxactglobal.com and rackshack.net, one of the major things I thought would be an advantage of Rackshack was their large number of different connections and the ample spare capacity on each: http://www.rackshack.net/aboutus/networks.asp

So I was surprised that they are still expecting an outage because of re-routing of their two Time Warner links. Now I know that's two big links, but looking at the graphs it would seem they could easily pick up that capacity on their many other lines. But I guess multihoming and BGP doesn't actually allow them to switch as easily and automatically as I thought. I thought if someone said their network was BGP and multihomed that would mean if one link went down the others would pick up the slack instantly and without manual intervention... data would seek out the fastest and working link. But it sounds like their re-routing will be done manually before the outage or else not at all. So I guess a multihomed BGP network really doesn't mean you have an instant backup if one provider fails?

Also there was the topic of the difficulty of re-routing incoming traffic which I didn't think of when considering a provider... I never though about the other side being just as critical a sticking point and I still don't quite understand all the difficulties involved... but it appears I need to learn alot more than simply looking for the terms BGP and multihomed.

sigma
03-24-2002, 10:33 AM
Originally posted by panopticon
So I guess a multihomed BGP network really doesn't mean you have an instant backup if one provider fails?


A multihomed network using BGP4 will indeed provide nearly instant fail-over to alternate routes for many failure scenarios. Specifically, when a line goes down, the BGP session goes down with it, and the routes being advertised on that session disappear, so other routes are chosen instead.

I didn't read the Rackshack article so I can't comment on that. However, there are certainly failure modes that BGP4 cannot cover. For example, if you have a line to UUnet and a line to Sprint, but UUnet's POP has a major problem sending data out to other parts of UUnet's network, you'll still have your line and BGP session up with UUnet, but the data you send them might have trouble getting anywhere beyond their POP. BGP is not a cure-all.

Kevin

allan
03-24-2002, 10:21 PM
Originally posted by sigma

A multihomed network using BGP4 will indeed provide nearly instant fail-over to alternate routes for many failure scenarios. Specifically, when a line goes down, the BGP session goes down with it, and the routes being advertised on that session disappear, so other routes are chosen instead.


As Kevin has pointed out, there should be no reason to have more than a second or two of downtime with BGP routing in place.

In addition to the external scenario that Kevin described, the possibility exists that RackShack has sub-optimal routing on their internal network, and that could be the reason for the extended outage (note: I do not know that they have sub-optimal routing, but that could account for the lenghty downtime). In a data center the size of RackShack's, undoubtedly there are a couple of network layers before the routers, and they may not be set up for to take the best routing path.

Of course the possibility also exists, that this was a CYA notice. RackShack may not expect any downtime, but whenever you lose part of your connectivity the possibility exists...better to be safe than sorry.