Results 1 to 18 of 18
  1. #1

    100% Uptime Redux

    I was reading the following thread...

    http://www.webhostingtalk.com/showth...5&pagenumber=1

    ... and I'm a bit suprised. A good number of the posters seem to equate 100% uptime with the second coming of Christ. Is it that hard to pull off?

    This may betray my ignorance (nothing new there ) But can't 100% uptime be facilitated by multiple servers (in more than one DC) and round robin DNS?

    Just for clarification let me laydown a few notes on my interpretation of 100% uptime...

    1. 100% network/power availability doesn't cut it, thats a feature of a datacenter not a web host.

    2. 100% uptime means 100% content availability. The server it's providing the content from is irrelevant if its within the specs the customer signed up for (for my hypothetical question assume all servers are identical.

    Justin

  2. #2
    Join Date
    Feb 2002
    Location
    New York, NY
    Posts
    4,612
    Downtime can be caused by any number of things. Hard drive failure, complete server failure, switch/router failure, BGP route screwups, etc. Running a single server on a single network perhaps can get you 99.5% to 99.9% uptime if you have really good hardware (RAID storage) and a good network. Having two servers running together in a failover setup might gain you 99.99% uptime on a really good multi-homed network with competent BGP engineering.

    So your solution would be round-robin DNS, with mirrored servers on seperate networks? What happens if your DNS configuration gets corrupted somehow? What if the software you use to synchronize the servers does something wrong? What if you're running the same server software (Apache, PHP, etc) on those servers, and an as-yet-undiscovered bug is set off by a change in your content? Round-robin DNS is a step up, yes, but not to 100% uptime.

    100% uptime is not realistic. You can keep adding nines to 99, but you're adding magnitudes of cost each time you do. When it comes to searching for a level of reliability, you need to be realistic.
    Scott Burns, President
    BQ Internet Corporation
    Remote Rsync and FTP backup solutions
    *** http://www.bqbackup.com/ ***

  3. #3
    Join Date
    Mar 2003
    Location
    Canada
    Posts
    8,910
    I haven't read the other thread, so I don't know if this has been mentioned or not.

    100% uptime is technically possible, but you need to consider that there will be times when you need to reboot the server for hardware and software upgrades.

    Don't get me wrong, I love high server uptimes, but I know that when a security update needs a reboot to take effect, it's more important then a fancy 200+ day uptime.

  4. #4
    Originally posted by bqinternet
    So your solution would be round-robin DNS, with mirrored servers on seperate networks? What happens if your DNS configuration gets corrupted somehow? What if the software you use to synchronize the servers does something wrong? What if you're running the same server software (Apache, PHP, etc) on those servers, and an as-yet-undiscovered bug is set off by a change in your content? Round-robin DNS is a step up, yes, but not to 100% uptime.
    Good points, thanks.

    Justin

  5. #5
    Originally posted by Pat H
    I haven't read the other thread, so I don't know if this has been mentioned or not.

    100% uptime is technically possible, but you need to consider that there will be times when you need to reboot the server for hardware and software upgrades.

    Don't get me wrong, I love high server uptimes, but I know that when a security update needs a reboot to take effect, it's more important then a fancy 200+ day uptime.
    But this is mitigated by using multiple servers. Sure, your hardware utilization leves may suck but your operation is still "up" overall.

    Justin

  6. #6
    Join Date
    Dec 2000
    Posts
    610

    Re: 100% Uptime Redux

    Originally posted by jbigelow

    This may betray my ignorance (nothing new there ) But can't 100% uptime be facilitated by multiple servers (in more than one DC) and round robin DNS?
    You would have to do something more involved than round robin dns. Round robin dns is a quick, cheap way to some what balance traffic between servers but the nameserver will still send 50% of your traffic to the server that is down. You would need a script that checks each server for a proper response and than edits the dns settings to stop people from going to the down server for this to work.

  7. #7
    Join Date
    Oct 2002
    Location
    EU - east side
    Posts
    21,913
    I highly doubt that even mathematically 100% uptime can be achived. It's all down to probabilities and nothing is 100% in this universe.

  8. #8
    Join Date
    Mar 2001
    Posts
    1,434
    Not to mention the software and hardware in place to keep synchronized data between all of these servers always up to date, never corrupt, and up to date instantly, as with 100% uptime requirements, you cannot afford a loss of data for even 1 second. This is where the cost starts to skyrocket. If it's a static html page only, I can easily give you 100% uptime over 3 or 4 servers in diverse locations with 5 second TTL round robin and a few load balancers. Add dynamic content and wham, costs skyrocket, as does points of failure.

    Round robin DNS is fine, and with low TTL's of 5 seconds, you can switch on the fly, but there is still seconds where there could be downtime, not to mention caching proxy servers, etc... that can interfere with DNS switching as a reliable means of uptime.

    - John C.

  9. #9
    Join Date
    Feb 2004
    Location
    USA
    Posts
    1,571
    Originally posted by ldcdc
    I highly doubt that even mathematically 100% uptime can be achived. It's all down to probabilities and nothing is 100% in this universe.
    yea i also believe that the closed you can get is 99.999%

    there always a possibility that something can go wrong in this world

    cheers

  10. #10
    Join Date
    Oct 2002
    Location
    EU - east side
    Posts
    21,913
    In theory 99.(9)% might be possible, but I doubt that 100% uptime, indefinitely, could be demonstrated - even if only mathematically.

  11. #11
    Join Date
    Feb 2004
    Location
    USA
    Posts
    1,571
    Originally posted by ldcdc
    In theory 99.(9)% might be possible, but I doubt that 100% uptime, indefinitely, could be demonstrated - even if only mathematically.
    correct

  12. #12
    Join Date
    Jan 2003
    Location
    Lake Arrowhead, CA
    Posts
    789
    The server it's providing the content from is irrelevant if its within the specs the customer signed up for
    But it's not irrelevent and that's where one of the large problems lies. Very near 100% uptime is relatively easy using the methods you describe (redundant machines, redundant DC's) if you're serving only static content. The moment you put a live database into the mix, it gets quite a bit more difficult.

    We can claim "100% html availability" for months on end because we switch all sites over to mirror servers during maintenance reboots, however we can't claim 100% database synchronicity because resyncing from server to server invariably takes enough time to potentially deny customers a few transactions. Thus, you need to consider is "100% content availability" really 100% if the content might be a few seconds older than it should be on occasion?
    http://www.srohosting.com
    Stability, redundancy and peace of mind

  13. #13
    Join Date
    Jun 2003
    Location
    United States of America
    Posts
    1,838
    the only thing that can maintain the closest thing to 100 percent uptime is 127.0.0.1 (but if your OS isnt Linux dont count on it)
    Computer Steroids - Full service website development solutions since 2001.
    (612)234-2768 - Locally owned and operated in the Minneapolis, Minnesota area.

  14. #14
    It's long. But you should endour. I did reading postings in this and other similar threads. Here is how people outside of this forum understend and interpret this subject.

    " General
    An item or system is specified, procured, and designed to a functional requirement and it is important that it satisfies this requirement. However it is also desirable that the the item or system should be predictably available and this depends upon the its reliability and availability. ......

    Availability
    The ability of an item to be in a state to perform a required function under given conditions at a given instant of time or during a given time interval, assuming that the required external resources are provided.

    At its simplest level..

    Availability = Uptime / (Downtime + Uptime)
    The time units are generally hours and the time base is 1 year . There are 8760 hours in one year.

    From the design area of concern this equation translates to ..

    Availability(Intrinsic) A i = MTBF / (MTBF + MTTR)
    MTBF = Mean time between failures..
    MTTR = Mean time to repair / Mean time to replace.

    Operational availability is defined differently

    Availability (Operational) A o = MTBM/(MTBM+MDT).
    MTBM = Mean time between maintenance..
    MDT = Mean Down Time
    ---------------------------------------------------------------------------------------------


    Reliability
    The ability of an item to perform a required function under given conditions for a given time interval.

    The reliability is expressed as a probability (0-1 or 0 to 100%). Thus the reliability of a component may be expressed as 99% that it will work successfully for one year. The reliability is essentially an indication of probability that a the item will not fail in the given time period.

    A very generalised curve for the failures rates of components over time is the bathtub curve. This shows that in the early period a number of failures result from manufacturing, assembly, commissioning, setting to work problems. When all of the teething problems have been eliminated the remaining population has a useful life over which the items fail at a relatively low rate. After a long operating time interval the items will fail at an increasing rate due to wear and other time related functions. This curve applies mostly to electronic components which is why electronic products are operated continuously for set times (burn-in) prior to delivery to users..

    ---Here is where curve is placed------------

    The bathtub curve for mass produced mechanical items is controlled to minimise the initial early failure period by use of quality control to ensure uniformity of production of high reliability items. Before items are introduced onto the market they are rigorously tested to identify and correct design and manufacturing problems. A prime target of design, manufacturing and operation is to ensure that the useful life is extended by attention to the following factors

    * Strength/ Life safety factors
    * Tribology considerations (Prevention of wear and lubrication )
    * Corrosion prevention
    * Protection against environment effects (temperature /humidity)
    * Fatigue
    * Vibration
    * Regular servicing (or elimination) of short life components (filters /brakes pads etc)

    For systems with items in series the overall reliability is the product of the reliabilities of the individual components..

    For systems with active items in parallel the resulting reliability is improved. For example if there are two items in parallel A (Reliability Ra) and B (Reliability Rb). The overall reliability is = 1-(1-Ra)*(1-Rb)
    --------------------------------------------------------------------------------------

    Maintainability
    The ability of an item under given conditions of use, to be retained in, or restored to, a state in which it can perform a required function. When maintenance is performed under given conditions and using stated procedures and resources.

    When a piece of equipment has failed it is important to get it back into an operating condition as soon as possible, this is known as maintainability. To calculate the maintainability or Mean Time To Repair (MTTR) of an item, the time required to perform each anticipated repair task is multiplied by the relative frequency with which that task is performed(e.g. no. of times per year). MTTR data supplied by manufacturers will be purely repair time which will assume the fault is correctly identified and the required spares and personnel are available. The MTTR to the user will include the logistic delay as shown below. The MTTR should also include factors such as the skill of the maintenance engineers

    MTTR User factors...

    * Detection of fault
    * Start Up mainenance team
    * Diagnose fault
    * Obtain Spare parts
    * Repair (MTTR-Manufacturers information)
    * Test and accept repair
    * Start up equipment
    -----------------------------THE END----------------------------------------------

    Can we get along and use a technical language common for everybody?

    Peter Kinev.
    Open Solution, Inc
    http://opensolution-us.com

  15. #15
    What's the uptime of sites like microsoft.com, Yahoo, or Google? Wouldn't that be extremely close to 100%?

  16. #16
    Join Date
    Mar 2004
    Location
    England
    Posts
    819
    Originally posted by wickedwitch
    What's the uptime of sites like microsoft.com, Yahoo, or Google? Wouldn't that be extremely close to 100%?
    Interesting point, I have never experienced any downtime for any of those and sites like bbc.co.uk

    Andrew
    NetHosted - UK based hosting solutions.

  17. #17
    Originally posted by wickedwitch
    What's the uptime of sites like microsoft.com, Yahoo, or Google? Wouldn't that be extremely close to 100%?
    Probably "five nines" (99.999) - correlates to 5 minutes, 15 seconds downtime per year.

    Peter.
    Open Solution, Inc
    http://opensolution-us.com

  18. #18
    Join Date
    Jun 2003
    Location
    Indy
    Posts
    379
    People usually offer a 100% SLA, that doesn't gurantee they will be up 100%. It just means they will compensate you if they aren't. I said it in the other thread, and don't think some people understand tthe difference between actually giving 100% uptime and having an SLA.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •