Web Hosting Talk







View Full Version : Server health..


Aussie Bob
09-23-2002, 02:37 AM
Is there a way to compile data from sources such as load, CPU idle etc, to come up with a figure that could be representative of the server's real health?

coight
09-23-2002, 03:52 AM
Im interested to. :D

ShockHost
09-23-2002, 03:55 AM
Interested x 2

Haze
09-23-2002, 03:57 AM
What exactly do you mean by the servers health?

UH-Matt
09-23-2002, 04:51 AM
maybe use mrtg graphs of the info needed ? the mrtg logs could then be used to create some sort of value.

chrisb
09-23-2002, 05:19 AM
um, health means condition... ie, poor, good, excellent

Aussie Bob
09-23-2002, 05:31 AM
MY request is kind of a contination of chrisb's thread (http://webhostingtalk.com/showthread.php?s=&threadid=75172) about server loads etc and folks saying that's not a good indication of the server's performance. I was just wondering if there was a formulea for gathering different pieces of data into 1 figure and thus getting a picture of server performance....

Haze
09-23-2002, 05:52 AM
Hmm, I'm not sure if a program can give you an accurate view of your servers health really. It ultimately depends on various factors and those various factors differ at times.

Take Chris' thread for instance. His servers load was high as heck, yet there was enough free RAM available to keep the server under control and it seemed things were going just fine, even with a server load that high. Thats not to say I would be comfortable hosting a site with such a high load, but its really not a determination of the systems health.

Then we have anile8's threadhere (http://www.webhostingtalk.com/showthread.php?s=&threadid=76293), where he it is apparent to me that he is running short on RAM.

My point is, there are a number of factors that can contribute to a server slowing down or becoming "unhealthy". Now I'm no programmer but I think such an idea would be rather impossible to put into place because of all the different variables that come to the final value. I think the best method to date is a competent system admin.

chrisb
09-23-2002, 06:22 AM
Just for the record, my server was NOT running fine. I thought I had problems, but later said pages loaded fast and without a problem because I didn't remember for sure, and recent tests ran fine, and I also wanted to give the host the benefit of the doubt.

Within the last few days, I noticed connection problems that were returning "server busy" errors. Also, that host admitted they were overloaded and had people running scripts that were too intensive and would give them a week to remove them, and they also stated that some of their accounts on that server had grown, so in combination with other things, I canceled my acct there.

Now, the first indication of that problem for me was the server load averages, and since every host that posted their load avgs was <1, it seems to me that load avgs do shed light. Though they may not tell the entire story, they seem to be a pointer to other problems.

Haze
09-23-2002, 06:25 AM
Originally posted by chrisb
Now, the first indication of that problem for me was the server load averages, and since every host that posted their load avgs was <1, it seems to me that load avgs do shed light. Though they may not tell the entire story, they seem to be a pointer to other problems.
I thought that would cause a problem sooner or later.

coight
09-23-2002, 06:30 AM
Beau that's why Bob, said cpu idle time free ram etc.

chrisb
09-23-2002, 06:38 AM
If I knew what things to compare, I could write a script. Maybe someone smarter than me like umBillyCord or bitserver will dorp in, and help us out.

cperciva
09-23-2002, 01:09 PM
Originally posted by Aussie Bob
Is there a way to compile data from sources such as load, CPU idle etc, to come up with a figure that could be representative of the server's real health?

As a *very* rough benchmark:

$loadaverage is your 15 minute load average (the third value) as reported by `uptime`.
$numprocs is the number of processors in your machine.
$tmem is the total amount of memory in your machine.
$memu is the "used" + "shrd" memory, as reported by `top` (for linux), or the "active" memory, as reported by `top` (for *BSD).
$swpu is the "used" swap space, as reported by `top`.
$dide is the number of IDE drives in your machine.
$dscs is the number of SCSI drives in your machine.
$dtps is the sum of the "tps" values shown by `iostat -d`.

1. If your $loadaverage/$numprocs is less than 1.0, score 25 points.
2. If your $loadaverage/$numprocs is between 1.0 and 2.0, score 20 points.
3. If your $loadaverage/$numprocs is between 2.0 and 5.0, score 12 points.
4. If your $loadaverage/$numprocs is above 5.0, score 0 points.
5. If your $memu/$tmem is below 0.5, score 25 points.
6. If your $memu/$tmem is between 0.5 and 0.75, score 20 points.
7. If your $memu/$tmem is between 0.75 and 0.9, score 12 points.
8. If your $memu/$tmem is above 0.9, score 0 points.
9. If your $swpu/$tmem is below 0.25, score 25 points.
10. If your $swpu/$tmem is between 0.25 and 0.5, score 20 points.
11. If your $swpu/$tmem is between 0.5 and 1.5, score 12 points.
12. If your $swpu/$tmem is above 1.5, score 0 points.
13. If your $dtps/(100*$dide+150*$dscs) is below 0.25, score 25 points.
14. If your $dtps/(100*$dide+150*$dscs) is between 0.25 and 0.5, score 20 points.
15. If your $dtps/(100*$dide+150*$dscs) is between 0.5 and 1.0, score 12 points.
16. If your $dtps/(100*$dide+150*$dscs) is above 1.0, score 0 points.

You should now have a total score between 0 and 100:
If your score is 100: Your server seems to be in excellent condition.
If your score is 90-99: Your server is doing pretty well.
If your score is 76-89: Not too bad, but you should probably look at where you lost points and see if you can do anything about that.
If your score is 50-75: This is getting pretty bad. You should definitely look at upgrading the system or reducing the load on it.
If your score is 0-49: That server is SICK.

UH-Matt
09-23-2002, 01:33 PM
lets make the 4 parts out of 25.. then you get a nice clean score out of 100 :)

cperciva
09-23-2002, 02:18 PM
Originally posted by UH-Matt
lets make the 4 parts out of 25.. then you get a nice clean score out of 100 :)

Happy now?

chrisb
09-24-2002, 05:02 PM
Hey Bob,
I haven't forgot. I am writing the script.
-chris

<EDIT> Just wanted to let you know that I started writing it, but can't get access to some of the info since I don't have root privileges. I also don't know how valid the results would be either, so I may not continue... especially since I'm not getting paid to write it. :)

chrisb
09-28-2002, 04:05 AM
Hey Bob,
Like I stated above, I'm not sure how valid the other poster's equation is or how helpful the results will be. I started on it, but it seems to me that writing further would be a waste of time since you can see most of that info at a glance from either ps aux or top. Anyway, I share with you what I wrote:

#!/usr/bin/perl -w
use strict;
print "Content-type: text/html\n\n";
### written by chrisb

$|=1; # Flush the buffer

### Number of processors on your machine my $numprocs = "1";

### Load Average in last 15 minutes
my $loadavg = `uptime 2>/dev/null`;
my @loadavg = split(/,/, $loadavg);
$loadavg = $loadavg[5];
print "$loadavg<BR><BR>";
my $la = $loadavg/$numprocs;

if ( $la le '1' )
{
$loadavg = '25'
}
elsif ( ($la gt '1') & ($la le '2') )
{
$loadavg = '20'
}
elsif ( ($la gt '2') & ($la le '3') )
{
$loadavg = '12'
}
else
{
$loadavg = '00'
};

print "$loadavg<BR><BR>\n";

print `top -bn 1`;

cperciva
09-28-2002, 04:22 AM
Originally posted by chrisb
Like I stated above, I'm not sure how valid the other poster's equation is or how helpful the results will be.

I think I remarked that it was a very rough formula. If you understand what the input variables are, and why they matter, you'll be able to work out for yourself how a server is doing, much better than the formula I gave -- it's only going to be at all useful for the people who come in with questions like "My load average is 123.45, is that bad?"

chrisb
09-28-2002, 04:37 AM
Originally posted by cperciva


I think I remarked that it was a very rough formula. If you understand what the input variables are, and why they matter, you'll be able to work out for yourself how a server is doing, much better than the formula I gave -- it's only going to be at all useful for the people who come in with questions like "My load average is 123.45, is that bad?"

I didn't mean it as a criticism. Yes, you stated it was just a rough estimate. Anyhow, I just posted what I did, in case anyone wants to write further on it. BTW, there's no error checking or tainting because this was written for a root user on a suEXEC-enabled server, though it can also be run as non-root. Darnnit, my indentions on those elsif statements didn't come out in the post. Oh well.