There are tons of threads about this here on the forums. It is possible to achieve 100% uptime, but it will cost quite a bit for that redundancy. You will need HA software and at least 2 servers in the cluster.
Originally posted by WebAfrica Because I believe a distributed fault tolerant network is a superior hosting structure to the current "daily backup and hope your server doesnt die" method.
All I've seen is, yes it can be done = $$$. I want technical information.
Well if you are getting into the shared hosting market it would be very hard to get clients to cover your costs of such a solution just because you have 100% uptime via SLA instead of 99.95% which can be had a lot cheaper.
I would first look to going with a simple load balanced system at a provider that does offer a 100% uptime SLA. Look at rackspace.net (hosts WHT) or maybe even internap if you want some top quality hosting. Internap has gone down twice that I know of, both times were for incredibly dumb reasons as well, so take that into consideration. If you are going to be offering shared hosting I would rent out a small portion of a rack to start, co-locate two Dual Xeon machines, 2GB of ram each, and going with a SCSI RAID array on both of them. I would also invest in a CISCO or BIG-IP load balancer and of course your own personal switch and firewall to sit in front of that.
If you do not want to go that way, you are going to be very hard pressed to find a solution that is going ot offer the type of redundancy that you are looking for. There is no way to do such a thing through pure DNS because even if the updates were made, many providers (such as AOL) disreguard TTL settings sent by nameservers all together and there would be downtime. Remember, 5 minutes in a month would drop you under your 100% margin. Now if you wanted a truly mission critical type of hosting situation, this would require having hardware in at least to datacenters as I'm sure you already know. You would also need to have a load balancing/redundancy machine/device that would have to have extremely high availbility. If the network went down on this device, your SLA would also be ruined for that month. In the end you would need a cluster of load balancers set up in multiple datacenters (most likely custom built because I have yet to personally see a commercial and/or GNU product to do such a thing) that constantly monitored and directed traffic to another network of completely mirrored machines spread across many datacenters.
I do not want to sound mean, but in the hosting industry it really is a waste to invest all that money for 100% uptime. Most of the clients willing to pay the money to cover the costs you would endure will have their own system in place or built when they need it. Get a machine in a datacenter with a great SLA such as internap, rackspace, etc... fully managed. That would really be your best bet.
Originally posted by WebAfrica Exactly hence my original question.
I don't feel 100% uptime is achievable. Although 99.999% is. Since DNS is not an option due to TTL, look into IP Anycasting and portable IP space. IP Anycasting is how the big guys do multiple datacenter failvoer, although I don't know much about it. A portable IP block would allow you to use the IPs at different networks. Something similar to Linux-HA could be used to reassign the IPs to which datacenter that was up. There may be a silght delay of a couple seconds while the routes are re-established, as monaghan mentioned, but this shouldn't be that big of a problem.
I was just thinking there should have been an RFC standard that allows you to specify www1 and www2 DNS records (just like multiple MX), that way the client will automatically try www2 if it can't connect on www1.
That would pretty much solve the problem. You can achieve true 100% uptime with SMTP delivery that way.