Web Hosting Talk







View Full Version : *yet another LOAD thread*


papi
07-06-2004, 02:29 AM
any way to find out what's causing a constantly high load on linux? no procs show up on top that are taking more than normal cpu time .. no swap is being used .. what else can I check? the load's been constantly 2+ for a few hours now when it's normally under 0.20

This is a dual xeon box with 1gb of ram - very low traffic sites (50gb/m for whole box) ... there's absolutely nothing in top showing up that could cause these loads such as mailma (qrunner) or log parsing ... nothing unusual, just a few httpd and mysql processes here and there (nothing suspicious in netstat either)

Any help? I'd just like to find out what could possibly be causing a load of 2 to 3.00 for 5 hours when the server is almost idle and normally during idle the loads are between 0.00 and 0.30

vantage255
07-06-2004, 02:34 AM
Is the system in swap?? if so you could look to see what processes are using a lot of memory..
Generaly either swapping or some other disk intensive activity is what will raise the load up, if you dont see anything running away with cpu cycles.

papi
07-06-2004, 02:40 AM
swap and hdd activity seem normal (compared to my other servers):

4:38pm up 73 days, 13:00, 1 user, load average: 2.66, 2.55, 2.32
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
root pts/0 202.1.2.3 3:46pm 0.00s 1.72s 0.02s w


vmstat:

vmstat
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 2816 35060 88832 529488 0 0 0 1 0 0 0 1 1

iostat:

avg-cpu: %user %nice %sys %idle
1.93 0.16 0.80 97.10

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
dev8-0 16.38 95.40 308.77 606184566 1961963196
dev8-1 0.46 23.32 90.24 148194938 573396128


Do you see anything wrong with those numbers? I don't... this is driving me nuts! :( 5 hours now the load's been around 3.00 with no logical explanation of any sort!

vantage255
07-06-2004, 02:53 AM
if it were bsd I would recomend running "systat -vmstat"


Is systat maybe installed on the system???

Sitting and watching that output for a while might help you out....

I have also sean where top itself is the issue.. but I have only sean that on a dual processor machine where top will randomly come up in single processor mode.... adn then it reads incorrectly..

papi
07-06-2004, 03:07 AM
No systat here :(

I doubt it's a problem with top as it hasn't been changed recently (or ever) and this is sample of current output:

5:07pm up 73 days, 13:28, 1 user, load average: 2.30, 2.61, 2.54
104 processes: 102 sleeping, 1 running, 1 zombie, 0 stopped
CPU0 states: 0.1% user, 1.0% system, 0.0% nice, 97.0% idle
CPU1 states: 0.1% user, 0.0% system, 0.0% nice, 99.0% idle
CPU2 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle
CPU3 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle
Mem: 1031576K av, 990732K used, 40844K free, 0K shrd, 91588K buff
Swap: 1052216K av, 2816K used, 1049400K free 523156K cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
5064 root 19 0 1064 1064 836 R 1.9 0.1 0:00 top c
20382 root 11 0 1404 1404 1136 S 0.9 0.1 0:08 antirelayd
1 root 8 0 464 420 408 S 0.0 0.0 1:01 init [3]
2 root 9 0 0 0 0 SW 0.0 0.0 0:00 keventd
3 root 19 19 0 0 0 SWN 0.0 0.0 0:00 ksoftirqd_CPU0
4 root 19 19 0 0 0 SWN 0.0 0.0 0:00 ksoftirqd_CPU1
5 root 19 19 0 0 0 SWN 0.0 0.0 0:02 ksoftirqd_CPU2
6 root 19 19 0 0 0 SWN 0.0 0.0 0:00 ksoftirqd_CPU3
7 root 9 0 0 0 0 SW 0.0 0.0 8:00 kswapd
8 root 9 0 0 0 0 SW 0.0 0.0 0:00 bdflush
9 root 9 0 0 0 0 SW 0.0 0.0 4:48 kupdated
10 root 9 0 0 0 0 SW 0.0 0.0 0:00 ahd_dv_0
11 root 9 0 0 0 0 SW 0.0 0.0 0:00 ahd_dv_1
12 root 9 0 0 0 0 SW 0.0 0.0 0:00 scsi_eh_0
13 root 9 0 0 0 0 SW 0.0 0.0 0:00 scsi_eh_1
14 root -1 -20 0 0 0 SW< 0.0 0.0 0:00 mdrecoveryd
15 root 9 0 0 0 0 SW 0.0 0.0 13:32 kjournald
125 root 9 0 0 0 0 SW 0.0 0.0 0:00 kjournald
126 root 9 0 0 0 0 SW 0.0 0.0 10:32 kjournald
127 root 9 0 0 0 0 SW 0.0 0.0 5:01 kjournald
128 root 9 0 0 0 0 SW 0.0 0.0 20:56 kjournald
129 root 9 0 0 0 0 SW 0.0 0.0 5:35 kjournald
1313 nobody 9 0 1544 924 780 S 0.0 0.0 8:26 proftpd: (accepting connections)

..which of course appears absolutely normal - except for the loads at the top which have been like that for way too many hours now :/

goldenplanet
07-06-2004, 03:19 AM
This is a *very* long shot but I have seen similar things when the following criteria were met:

* Quotas are enabled on at least one filesystem
* An NFS mount has become unavailable

It looks like the quota system is trying to check if quotas are enabled on the inaccessible NFS mount and that creates a queue of unfinished requests that shows up as load in top.

Yes, I know - but like I said: A *very* long shot! ;)

papi
07-06-2004, 03:37 AM
no NFS mounts here afaik :/

goldenplanet
07-06-2004, 03:45 AM
Bummer. Anything else installed recently? I know that e.g. Dell Open Manage Server Administrator has a bug that causes the load to bump up to one instead of zero on a totally unloaded server - and of course all other load figures are bumped relative to that. I don't remember the technical explanation for this but I think that they mentioned that other load monitors could cause something similar.

papi
07-06-2004, 03:53 AM
Nah nothing has been installed recently (last week or so) ..and this only started happening about 5 hours ago.

goldenplanet
07-06-2004, 04:02 AM
Hmmm - I'm more or less out of ideas, then. Could be a spinning process moving data around in RAM but that usually shows up with at least some CPU load as well. I take it that you already tried restarting the different services runing on the server (and checking with ps that no threads were staying alive during restart)? And how about ... do I dare say the word ..... reboot?

papi
07-06-2004, 04:44 AM
yes tried restarting all individual services - didn't help.

However I rebooted 30 mins ago and all is fine now (loads steadily under 0.20)

What gives ?!? I think I'll need to do a full audit because none of this makes any sense.

I've already run chkrootkit with no results. Does anyone know if that new rootkit tool called 'rootkithunter' can be trusted ?

BudWay
07-06-2004, 11:18 AM
Is been well recommend and use by many veteran admins