Web Hosting Talk







View Full Version : 99.9% uptime, A mith ?


Arsalan
05-30-2001, 12:36 PM
99.9% Up-Time Guarantee (http://www.wizardshosting.com/plans.htm) and lots of other hosts offer it. Personally i dont think its possible, but is it ?

.1 is about 72 min of downtime in a month. They surely must reboot for upgrades etc. once a month.

The question is can hosts which offer it be trusted ? and what is the actual uptime that we should expect.

JayC
05-30-2001, 12:52 PM
Originally posted by Arsalan
.1 is about 72 min of downtime in a month. They surely must reboot for upgrades etc. one a month.Why? Right now I have top running in another window, so I can easily see that the particular server I'm logged into has gone 202 days without a reboot. I expect it'll go months more. And many upgrades can be done without rebooting.

More commonly downtime will be because of connectivity problems, not because the server itself is down. Even so, 72 minutes a month is certainly not unachievable. We've had no loss of connectivity detected by qwk.mon in a couple of months.

Further, an uptime guarantee is just that. It's not a sacred bond that there'll never be downtime; it's an agreement to compensate your customers if you don't meet the stated level of service. You should make that guarantee at a level you're confident that you'll usually be able to reach, but also understand that sometimes you might not. Then you make good on your guarantee.

Arsalan
05-30-2001, 12:58 PM
But what about those people who have their servers is california ? or on other backbones, like fasthosts had with bt?, i dont think there are a lot of companies out there who have any where around 99.9% uptime.

dektong
05-30-2001, 12:59 PM
99.9% of uptime is possible, I believe. Now, about my server... I have not rebooted the server since I moved it to Site5.com, as you can see:


bash$ uptime
12:44pm up 28 days, 20:10, 1 user, load average: 0.07, 0.02, 0.00
bash$


And I have monitored the server using qwkmon.com, and here are the list of brief period of "downtime" (well, the network might not necessarily down.. qwkmon just takes an extreme network slowdown as being a sign of network down, I think) as recorded:


Log begins May 2 01:51
Host returned error May 19 05:35
Host resumed status normal May 19 05:39
Host returned error May 20 21:20
Host resumed status normal May 20 21:24
Host returned error May 25 22:23
Host resumed status normal May 25 22:27


So, the total of "downtime" for my server this month was around 12 minutes, which is a little bit more than 99.97% network uptime (and 100% hardware uptime).

Note: depending on the mechanism of how qwkmon.com determines whether a network is down, my server might actually have 100% network uptime and 100% hardware uptime. Also for some reasons that I do not understand, my server was monitored every hour, yet how come qwkmon.com able to determine my server was down for a period of 4 minutes only? Also another thing I do not understand, with this 12 minutes of "downtime", qwkmon.com claims that my server uptime was 99.56%, way less than it should be it the "downtime" was really only 12 minutes.

Anyway, my point is: 99.9% of (hardware and network connectivity) uptime is possible. 99.99% or 99.999% is kinda pushing it to the limit and honestly I do not think I am capable of maintaining such a high uptime guarantee. In fact, although I know that my server/connectivity is possible of maintaining higher than 99.5% uptime, I only offer it to my clients 99.5% uptime to be realistic... I think it's better this way than to offer them 99.99% of uptime yet I can't maintain it... Just my personal opinion :)

cheers,
:beer: -> will any of you mods give me this "beer" image back? :-)

Planet Z
05-30-2001, 01:03 PM
99.9% really isn't that uncommon or hard to achieve. It's actually about 43 minutes of downtime a month, maximum. You really shouldn't have to put up with more than that.

99.99%, however, is much harder to offer. That's a maximum of 4.3 minutes of downtime a month. Most hosts can't legitimately guarantee 99.99%. For instance, we hit 99.99% most of the time. But if our connection goes out for even 5 minutes, we've broken our promise. Or if a server reboots slowly. Etc. It's simply too risky.

There are hosts that do specialize in super high-uptime guarantees, but expect to pay a fair bit more for it.

dektong
05-30-2001, 01:04 PM
Originally posted by JayC
I'm logged into has gone 202 days without a reboot. I expect it'll go months more.

JayC, I hope one day my server will be able to beat your uptime period :) :) Anybody has longer than 202 days of server uptime? Let's have another race here.... :D

cheers,
:beer:

Alan - Vox
05-30-2001, 01:10 PM
I had a server up for 183 days. Im sure it would still be going now if we didnt close madhosts

ckizer
05-30-2001, 01:12 PM
This me seem not fair, but legally in the state of California if a hosts wants to say a certain uptime they can without counting server reboots. Uptime is defined as time not down due to system error. I was reading somewhere about this because a client sued a host for this.

Planet Z
05-30-2001, 01:12 PM
Our max was just over 300 days on a server running FreeBSD 2.2.8

Our current best uptime is:

1:19pm up 106 days, 17:08, 1 user, load average: 0.00, 0.00, 0.00

On an active BSD webserver.

JayC
05-30-2001, 01:36 PM
Planet Z:Most hosts can't legitimately guarantee 99.99%. For instance, we hit 99.99% most of the time. But if our connection goes out for even 5 minutes, we've broken our promise. Or if a server reboots slowly. Etc. It's simply too risky.
I'd still say you could "legitimately" guarantee it -- though I wouldn't, and I wouldn't recommend it. Again, all the guarantee does is state how you'll compensate customers if you don't meet your stated level of service. You could guarantee no more than 30 seconds of downtime, legitimately, as long as you always made good however you stated you would anytime you exceeded that downtime. Which, of course, would probably be a lot of times!

ckizer:legally in the state of California if a hosts wants to say a certain uptime they can without counting server reboots. Uptime is defined as time not down due to system error. I don't know, but I'd be surprised to find that it's defined anywhere in the state's codes. Most likely the basis of that suit was that the host's TOS defined it that way, not state law. Could be wrong, though. Do you have a reference for that, or remember where you read it?

vizi
05-30-2001, 02:01 PM
Not bad Planet Z. My longest was about 340 days. It was my linux box at home. It would have gone longer, but there was a power proplem at my house.


My best live server at the moment is our database server:

2:02pm up 173 day(s), 23:04, 1 user, load average: 0.21, 0.30, 0.32

Its a Sun box running Solaris. The uptime isn't too impresive, whats impressive about the box is mysql:

Threads: 10 Questions: 86885886 Slow queries: 23 Opens: 555 Flush tables: 1 Open tables: 63


86+ million queries. Can't wait to hit 100 :D

Planet Z
05-30-2001, 02:33 PM
Nice. We have a new server up that's running a big mySQL database. It's doing about a million queries a day and still had relatively low load. This is on a PC server (Athlon 1000 + 512MB RAM). I was actually quite surprised at the performance. It's also running Apache, sendmail, BIND, etc.

Threads: 37 Questions: 12808423 Slow queries: 0 Opens: 891 Flush tables: 1 Open tables: 64 Queries per second avg: 11.572

2:32PM up 12 days, 19:29, 1 user, load averages: 0.10, 0.21, 0.23

vizi
05-30-2001, 02:38 PM
Very nice, you'll beat me in no time :)

I wonder how many queries this server has.

bteeter
05-30-2001, 02:53 PM
Impressive numbers guys. :-) There is actually a site which tracks uptime records. The address is:

http://uptime.netcraft.com/up/today/top.max.html

I know there are more such sites, but I cannot find my other links...

Its always interesting to read that and then look at my desktop PC running Windows ME that never seems to exceed more than 1 week of uptime. But then again, I didn't see any MS OS's in the uptime list...

Take care,

Brian

Travis
05-30-2001, 02:59 PM
JayC:

When QWK.Mon can't reach a site, it schedules a re-check again in 3-4 minutes. That's what you're seeing in the logs. You're correct in that slow network performance is treated as downtime. (If QWK.Mon can't get a response in 12 seconds, it flags a failure.) Those short "down" periods you see probably weren't actual downtime, just slow performance.

On uptime - I have an old Micron P60 that runs (with the help of a T1 WAN card) one end of the office T1. It's been up for almost 600 days!

Phoenix
05-30-2001, 03:02 PM
Most standard SLA's for uptime exclude 'scheduled downtime' for things such as upgrades, maintenance, etc.

That can be a big loophole- we ran across one with a provider that we get a particular type of connectivity service from, their scheduled outage for an upgrade was only supposed to last about 10 minutes in the wee hours of Monday morning, it was down for three days when things went wrong during the upgrade. Because it was a 'scheduled outage' no credit was available for the downtime. They told me that giving us a credit for the outage would 'set a bad example'. We ended up eating the cost of crediting the affected customes for the duration of the outage-although the affected service was not a guaranteed one.

JayC
05-30-2001, 04:35 PM
Originally posted by Travis
JayC:

When QWK.Mon can't reach a site, it schedules a re-check again in 3-4 minutes. That's what you're seeing in the logs. Thanks, but that wasn't my question and those weren't my results... :)

On uptime - I have an old Micron P60 that runs (with the help of a T1 WAN card) one end of the office T1. It's been up for almost 600 days! Of course there are good reasons for a reboot, and I wasn't implying that there's anything necessarily better about a server or its administrators that can be measured by how long it's been since the last reboot. The point was just that there's no need for a reboot so regular that it would make a 99.5% uptime unrealistic -- even if one did choose to count such scheduled maintenance against the uptime stats.

Now I'm going to reboot that 202-day server, just on principle.

dektong
05-30-2001, 04:48 PM
Originally posted by Travis
JayC:


you mean 'dektong'? ;)


When QWK.Mon can't reach a site, it schedules a re-check again in 3-4 minutes. That's what you're seeing in the logs.

OK.. I had been wonderring about that too since there is very little possibility that on three different ocassions, the network was down for the same 4 minutes each :) Was wonderring about this magic number of 4 minutes :)
But.... why is my server uptime only 99.56%? How do you get this number?


You're correct in that slow network performance is treated as downtime. (If QWK.Mon can't get a response in 12 seconds, it flags a failure.) Those short "down" periods you see probably weren't actual downtime, just slow performance.


I also have been wonderring, so you do this by looking for the ping response? But depending on where you start this ping measurement, you may get different answers since different location will take different routes to the same server. I am pretty sure you can't tell where or at what point (or at what hops) the network slow down start to occur. Hence, how accurate is qwkmon.com in telling me whether the problem is within the NOC or an external problem unrelated to the NOC?

Anyway, thanks for the explanation!

cheers,
:beer:

Arsalan
05-30-2001, 04:49 PM
How can one perfom kernal upgrades without a reboot, any one have any idea?

Planet Z
05-30-2001, 05:16 PM
Originally posted by dektong

I also have been wonderring, so you do this by looking for the ping response? But depending on where you start this ping measurement, you may get different answers since different location will take different routes to the same server. I am pretty sure you can't tell where or at what point (or at what hops) the network slow down start to occur. Hence, how accurate is qwkmon.com in telling me whether the problem is within the NOC or an external problem unrelated to the NOC?



I personally don't know anything about QWKMON, but so far all the remote monitoring software I've seen doesn't distinguish where an outage is occuring. If the monitoring server can't ping your server, your server is considered down.

It would be quite difficult to have it determine where the problem was occuring. And then taking that data and figuring out if it's a problem on the monitoring server's end, at the users server/NOC end, or somewhere inbetween would be even more complex.

Félix C.Courtemanche
05-30-2001, 05:28 PM
Originally posted by Arsalan
How can one perfom kernal upgrades without a reboot, any one have any idea?

There is work in progress somewhere in the linux world. However, tehre is nothign ready yet and the tools they would provide would give a downtime of about 30sec when installing a new kernel (if it works!!)

dektong
05-30-2001, 08:30 PM
Originally posted by Planet Z

so far all the remote monitoring software I've seen doesn't distinguish where an outage is occuring.

try http://www.pingplotter.com/ (credit goes back to Carlos who introduced this software to me :) )... here is an example of the software capability, http://64.21.152.247/images/vdi.net.png . The site monitored happened to be VDI when it was still having problem (the graph was taken on may 13th, 11:01pm). As you can see, the problem was with their core router p4-0.core1.cftnnj.inet.vdi.net, at hop 11 (relative to my computer, of course).

[Note: that example does not mean to discredit VDI, they don't have this problem anymore, I believe...]

cheers,
:beer: