Web Hosting Talk







View Full Version : Why is this dedicated server overloaded?


pmak0
05-04-2001, 11:11 PM
Can anyone tell why this machine's load average is so high? I don't see anything significant on "top"...

11:09pm up 7 days, 10:53, 6 users, load average: 7.56, 6.94, 5.90
183 processes: 179 sleeping, 2 running, 1 zombie, 1 stopped
CPU states: 48.7% user, 7.0% system, 0.0% nice, 44.1% idle
Mem: 386504K av, 329076K used, 57428K free, 338052K shrd, 39460K buff
Swap: 265032K av, 24928K used, 240104K free 130484K cached

PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND
4287 qd 7 0 952 952 672 R 0 2.8 0.2 0:00 top
4166 nobody 2 0 3832 3832 3620 S 0 0.7 0.9 0:00 httpd
767 nobody 0 0 8924 7056 2612 S 0 0.5 1.8 0:01 perlhttpd
3965 nobody 1 0 8968 7084 2572 S 0 0.5 1.8 0:00 perlhttpd
3966 nobody 0 0 8808 6924 2564 S 0 0.5 1.7 0:00 perlhttpd
22245 nobody 1 0 10652 8864 2736 S 0 0.3 2.2 0:04 perlhttpd
25523 nobody 0 0 10484 8700 2732 S 0 0.3 2.2 0:03 perlhttpd
27837 nobody 0 0 10588 8784 2712 S 0 0.3 2.2 0:02 perlhttpd
31260 nobody 0 0 3880 3880 3636 S 0 0.3 1.0 0:01 httpd
3658 nobody 0 0 3832 3832 3624 S 0 0.3 0.9 0:00 httpd
630 root 0 0 248 220 176 S 0 0.1 0.0 2:20 syslogd
31195 nobody 0 0 3888 3888 3648 S 0 0.1 1.0 0:00 httpd
1759 nobody 0 0 8956 7072 2572 S 0 0.1 1.8 0:01 perlhttpd
3366 nobody 0 0 3888 3888 3640 S 0 0.1 1.0 0:00 httpd
3664 nobody 6 0 3800 3800 3620 R 0 0.1 0.9 0:00 httpd
3968 mysql 0 0 5544 5048 1208 S 0 0.1 1.3 0:00 mysqld
4030 nobody 0 0 3800 3800 3620 S 0 0.1 0.9 0:00 httpd

Annette
05-04-2001, 11:54 PM
I'm guessing the server is pretty busy, given all those httpd processes running. They're contributing to your load.

cperciva
05-05-2001, 12:30 AM
top doesn't show everything, and load average is a notorious misbehaver. Can you show us the output of "vmstat 2 5" executed at the same time as "top"?

pmak0
05-05-2001, 01:01 AM
Well, the machine's not as badly overloaded now (load average 1.59), but I guess that's still high (I read somewhere that a "healthy" load average is below the number of CPUs I have, i.e. 1.00?).

Here's the top and vmstat output from the same moment in time:

qd@animeglobe [~]# top

12:59am up 7 days, 12:44, 4 users, load average: 1.59, 1.51, 1.81
140 processes: 136 sleeping, 1 running, 1 zombie, 2 stopped
CPU states: 3.2% user, 3.4% system, 0.1% nice, 0.7% idle
Mem: 386504K av, 365260K used, 21244K free, 172452K shrd, 8268K buff
Swap: 265032K av, 23352K used, 241680K free 159332K cached

PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND
17822 qd 12 0 908 908 656 R 0 9.6 0.2 0:00 top
17766 nobody 1 0 8696 6796 2528 S 0 1.7 1.7 0:00 perlhttpd
17767 mysql 0 0 5504 5020 1208 S 0 0.8 1.2 0:00 mysqld
17795 nobody 0 0 0 0 0 Z 0 0.8 0.0 0:00 perlhttpd <d
1 root 0 0 120 68 52 S 0 0.0 0.0 0:10 init
2 root 0 0 0 0 0 SW 0 0.0 0.0 0:03 kflushd
3 root 0 0 0 0 0 SW 0 0.0 0.0 0:15 kupdate
4 root 0 0 0 0 0 SW 0 0.0 0.0 0:00 kpiod
5 root 0 0 0 0 0 SW 0 0.0 0.0 1:52 kswapd
6 root -20 -20 0 0 0 SW< 0 0.0 0.0 0:00 mdrecoveryd
630 root 0 0 248 220 176 S 0 0.0 0.0 2:22 syslogd
639 root 0 0 380 0 0 SW 0 0.0 0.0 0:00 klogd
653 root 0 0 196 136 80 S 0 0.0 0.0 0:01 crond
667 root 0 0 240 220 180 S 0 0.0 0.0 0:00 inetd
681 nobody 0 0 1456 1436 712 S 0 0.0 0.3 0:03 proftpd
690 root 0 0 640 440 364 S 0 0.0 0.1 0:10 sshd
1221 root 0 0 460 448 372 S 0 0.0 0.1 0:00 portsentry
qd@animeglobe [~]# vmstat 2 5
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 23352 21864 8292 159372 17 4 20 14 50 17 63 3 34
0 0 0 23352 22048 8308 159368 0 0 2 35 200 62 14 3 83
4 0 0 23352 20140 8316 159768 0 0 51 0 175 74 20 3 77
9 0 0 23352 12816 8336 160340 0 0 74 23 222 125 80 10 10
1 0 0 23352 19280 8336 160340 0 0 0 0 201 123 64 6 30
qd@animeglobe [~]#

cperciva
05-05-2001, 01:13 AM
Here's the problem:

procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
4 0 0 23352 20140 8316 159768 0 0 51 0 175 74 20 3 77
9 0 0 23352 12816 8336 160340 0 0 74 23 222 125 80 10 10


When your system is reading from block devices (hard disk, presumably) there's a dramatic spike in hardware interrupts, a sharp increase in cpu usage, and processes are lining up in the run queue.

It might just be an inevitable result of receiving requests in bursts, but I'd check whether you have UDMA enabled... this is the sort of thing I'd expect to see from a server which was using programmed IO for everything.

pmak0
05-05-2001, 01:21 AM
root@animeglobe [/etc/rc.d]# hdparm -d /dev/hda

/dev/hda:
using_dma = 1 (on)

That means UDMA is on, right?

BTW, I have the line "/sbin/hdparm -c3d1 /dev/hda" in my /etc/rc.d/rc.local.

A while ago I noticed this pop up on the "top" (but only intermittently; a few seconds later they were gone):

1:18am up 7 days, 13:02, 3 users, load average: 1.88, 2.08, 1.90
155 processes: 129 sleeping, 25 running, 0 zombie, 1 stopped
CPU states: 56.6% user, 13.0% system, 0.0% nice, 30.3% idle
Mem: 386504K av, 359476K used, 27028K free, 202176K shrd, 14724K buff
Swap: 265032K av, 23352K used, 241680K free 139692K cached

PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND
19671 nobody 20 0 2488 2488 952 R 0 3.8 0.6 0:00 ims2.cgi
19686 nobody 20 0 2452 2452 956 R 0 3.6 0.6 0:00 ims.cgi
19680 nobody 19 0 2444 2444 948 R 0 3.5 0.6 0:00 ims2.cgi
19679 nobody 18 0 2208 2208 944 R 0 3.1 0.5 0:00 ims2.cgi
19668 nobody 18 0 2144 2144 944 R 0 2.9 0.5 0:00 ims2.cgi
19675 nobody 18 0 2076 2076 944 R 0 2.9 0.5 0:00 ims2.cgi
19687 nobody 20 0 2156 2156 944 R 0 2.9 0.5 0:00 ims2.cgi
19474 root 10 0 940 940 672 R 0 2.5 0.2 0:03 top
19685 nobody 17 0 2080 2080 876 R 0 2.5 0.5 0:00 forumdisplay
19688 nobody 17 0 1960 1960 956 R 0 2.5 0.5 0:00 slashubb.cgi
19669 nobody 16 0 1728 1728 940 R 0 2.1 0.4 0:00 ims2.cgi
19670 nobody 16 0 1864 1864 944 R 0 2.1 0.4 0:00 ims2.cgi
19672 nobody 16 0 1884 1884 944 R 0 2.1 0.4 0:00 ims2.cgi
19673 nobody 16 0 1696 1696 940 R 0 2.1 0.4 0:00 ims2.cgi
19674 nobody 16 0 1792 1792 940 R 0 2.1 0.4 0:00 ims2.cgi

ims2.cgi is the instant messaging part of the flatfile based bulletin board (Ikonboard) that tuxedomask.com uses. Maybe that's what's eating up all the CPU? It's puzzling me though, how ims2.cgi will dominate the "top" list one moment, then a few seconds later all the processes are gone.

cperciva
05-05-2001, 01:42 AM
Ok, I take back what I said, it evidently isn't UDMA at fault. Yes, those cgi scripts look a likely cause. Remember that the load average is the *average length of the run queue*, so if lots of fast processes are started simultaneously they will have a significant impact on the load average while having little total cpu usage (starting 25 one-second processes simultaneously once each minute will increase the load average by about 7 while only using 40% of the cpu time). In the case of these processes, they are running at a low priority (high number) so they shouldn't interfere with anything.

Unless you're actually seeing poor performance, I'd disregard this but remember that the load average isn't going to give you very useful numbers. (Personally, I use the cpu and memory usage reported from vmstat to keep track of the machine state).

Annette
05-05-2001, 01:43 AM
If a process zombies out or goes defunct, most systems will usually kill it off automatically. And of course, if they cease gracefully, they'll go away on their own. Keep in mind that cgis especially can run up the process queue if they all kick off at the same time. Since they are generally transient, you'll see spikes like that. As long as they aren't continually running simultaneously, it's likely nothing to worry about unless you notice a true degradation in performance for extended periods of time.