Web Hosting Talk







View Full Version : server problem -- help needed.


patchwork
03-03-2002, 09:44 PM
Hi

I am trying to figure out a weird server load problem, I've been using the commands top, vmstat, "watch -n1 pstree" and iostat to try and find the source of the problem. After a couple of weeks of watching the output of these commands, and reading everything I can find on system load problems, I am still not a lot closer to working out what's going on.

Some server load graphs, and command results.
http://64.21.181.70/tmp/load.php

The load problem often happens during off peak hours, eg.. 5pm GMT, 6am GMT.

I've only had this server for a couple of weeks, my old server was pretty much the same spec as this machine, and the high loads never happened before the move. I've disabled as many non essential services as possible (analog, webalizer etc..) to see what effect it has on the load problem, but its still happening.


Server Hardware:

Processors 1
Model Pentium III (Coppermine)
Chip MHz 999.77 MHz
Cache Size 256 KB
System Bogomips 1992.29
PCI Devices ATI Technologies Inc 3D Rage IIC 215IIC [Mach64 GT IIC]
Intel Corp. 82557 [Ethernet Pro 100]
ServerWorks OSB4 IDE Controller
Adaptec 7899P
Adaptec 7899P (#2)
IDE Devices hda: CD-ROM 52X L
SCSI Devices IBM DDYS-T09170N ( Direct-Access )
Motherboard: Unknown (Duel CPU Ready)


Software:

Linux 7.2
Kernel 2.4
Apache 1.3
MySQL 3.23
WHM 4.5.0
Cpanel 4.4.0-196



I've noticed errors (between 250 - 350 per day) just like the one below in the error_log file.
[Sun Mar 3 23:56:12 2002] [notice] child pid 21976 exit signal Segmentation fault (11)
These errors seem to happen most hours of the day and not just during the weird high load periods.



I also noticed the following thread http://www.webhostingtalk.com/showthread.php?s=&threadid=33463
talking about the "Intel Corp. 82557 [Ethernet Pro 100]" card and the eepro100 driver, I have looked in the /etc/modules.conf file and my system is using this driver.


So what do you think?
Could the segmentation faults be the cause?
Is the driver the problem?
Is it something else?

Help

Pete Kelly :confused:

bitserve
03-05-2002, 11:29 PM
The errors in your error_log where the children are seg faulting unexpetedly are probably caused by a badly compiled apache daemon or apache module. Did you find any core files in your apache log directory, or anywhere?

When starting top, the CPU states are usually going to be off until you let it refresh once. If you didn't let it refresh, and your CPU states don't equal 100%, then I'd suspect you've been rooted, and top isn't showing you the processes that are causing the high load average. Of course, I'm paranoid. Anyway, it's not very helpful when they don't equal 100%.

One sure way to see if apache is the cause of your problems, is to stop apache when the load goes up. I would suspect that it's the cause.

patchwork
03-06-2002, 10:41 PM
>>Did you find any core files in your apache log directory, or anywhere? <<

Nope

>>One sure way to see if apache is the cause of your problems, is to stop apache when the load goes up. I would suspect that it's the cause.<<

Stopping apache did reduce the load, so I started searching the forum for apache related problems and found the following thread.

http://www.webhostingtalk.com/showthread.php?s=&threadid=35337

I am hoping my problem is the same as the problem discussed in this thread. Basically it looks like a httpd.conf default settings problem. Lets just hope this sorts out my weird server load problem, I will know in a day or two.

Thanks for your input :-)
Pete

spender
03-06-2002, 10:49 PM
also make sure none of your logs are 2 gigs in size...apache will coredump when they get that big.

patchwork
03-06-2002, 10:53 PM
Just thought I would mention this, I have used your recommended setting from the thread I mentioned in my previous post, the server seems much more responsive now :-) I will just have to wait and see now what happens over the next few days, but I am confident this will fix the problem.


*********************
MinSpareServers 75
MaxSpareServers 255
StartServers 75
MaxClients 1024
MaxRequestsPerChild 100
KeepAliveTimeout 10
*********************

Pete :-)

patchwork
03-06-2002, 10:56 PM
Originally posted by spender
also make sure none of your logs are 2 gigs in size...apache will coredump when they get that big.


I gzip the logs every night and download them, otherwise they would fill up the drive in no time at all.

Pete

bitserve
03-07-2002, 02:08 AM
One of our web hosting customer's combined apache log is getting to be over half a GB in a month. That's our busiest site, I guess. I'm jealous when customers get so much more web traffic than we do.

I zip the files for him at the end of the month, and they shrink down to 30 some MB. Compression is neat.

I'm glad that you've got a bead on your apache problem.

patchwork
03-07-2002, 03:45 AM
On the server I run two main sites and a couple of very small sites, the apache dom log file sizes for the two main sites are (175Mb Raw --- 18Mb Zipped ---- 850,000 lines/hits) and (90Mb Raw --- 6Mb Zipped --- 400,000 lines/hits) per day. So it wouldn't take long to fill a partition, or get to the 2Gb max file size limit.

Pete

patchwork
03-07-2002, 10:08 PM
I'm glad that you've got a bead on your apache problem.



Nope its still doing it.




1:31am up 8 days, 1:55, 1 user, load average: 5.44, 5.24, 4.37
229 processes: 226 sleeping, 2 running, 1 zombie, 0 stopped
CPU states: 11.0% user, 3.9% system, 0.0% nice, 85.0% idle
Mem: 513748K av, 367656K used, 146092K free, 0K shrd, 44328K buff
Swap: 522072K av, 10656K used, 511416K free 141184K cached





Every 1s: pstree Fri Mar 8 01:32:31 2002

init-+-antirelayd
|-chkservd
|-clustermgrd
|-cpaneld
|-cpanellogd
|-cppop
|-crond---crond---sendmail
|-exim
|-httpd---171*[httpd]
|-keventd
|-4*[kjournald]
|-klogd
|-6*[mingetty]
|-named---named---3*[named]
|-portsentry
|-proftpd
|-safe_mysqld---mysqld---mysqld---2*[mysqld]
|-scsi_eh_0
|-scsi_eh_1
|-sshd---sshd---bash---su---bash---watch---pstree
|-5*[stunnel]
|-syslogd
|-updated
|-webmaild
|-whostmgrd
`-xinetd



Could it be a problem with mysql?


I did say in my first post (The load problem often happens during off peak hours, eg.. 5pm GMT, 6am GMT.) Well the load is unusally high at moment and its peak time in America, so I guess that statment is not always correct.


So I'm back to square one, the cpu usage is low, I have memory free, I have plenty of free disk space, so why is the load so high?


Does anybody know how to get the /server-status working maybe that can shed a bit of light on what's happening. I've turned "ExtendedStatus On" in the httpd.conf and stoped and restarted apache but just receive page not found errors. The "httpd -l" command says mod_status.c is compiled into apache, what else do I need to do to get it working?


Pete

bitserve
03-08-2002, 01:29 AM
Besides setting:

ExtendedStatus On

All you should need to do is add this:

<Location /server-status>
SetHandler server-status
</Location>

Then access your site at http://ipaddress/server-status

In the load information that you posted before, did you notice that there are actually fewer httpd processes running using less shared memory but more cpu when you're having a load problem?

bad load 96 daemons = 94.3 mem = 6.5 cpu
normal 113 daemons = 118.2 mem = 6.4 cpu
below 108 daemons = 114 mem = 1.8 cpu

I guess it could be MySQL, did you notice the zombie MySQL process when you had a high load average? I guess you can try stopping MySQL next time the load is high. :)

patchwork
03-08-2002, 04:03 AM
Originally posted by bitserve
Besides setting:

ExtendedStatus On

All you should need to do is add this:

<Location /server-status>
SetHandler server-status
</Location>

Then access your site at http://ipaddress/server-status



Cheers, at least this works :-)


Originally posted by bitserve

In the load information that you posted before, did you notice that there are actually fewer httpd processes running using less shared memory but more cpu when you're having a load problem?

bad load 96 daemons = 94.3 mem = 6.5 cpu
normal 113 daemons = 118.2 mem = 6.4 cpu
below 108 daemons = 114 mem = 1.8 cpu

I guess it could be MySQL, did you notice the zombie MySQL process when you had a high load average? I guess you can try stopping MySQL next time the load is high. :)


The numbers have actually changed quite a lot since redoing the http.conf settings, and yes the number of httpd processes to load average just doesn't make any sense at all.



Low Load

7:00am up 8 days, 7:24, 1 user, load average: 0.01, 0.08, 0.08
206 processes: 203 sleeping, 1 running, 1 zombie, 1 stopped
CPU states: 2.7% user, 1.7% system, 0.0% nice, 95.4% idle
Mem: 513748K av, 505380K used, 8368K free, 0K shrd, 92788K buff
Swap: 522072K av, 10272K used, 511800K free 250972K cached

pstree: httpd---146*[httpd]



Normal Load

2:52am up 8 days, 3:16, 1 user, load average: 0.21, 0.33, 0.75
295 processes: 293 sleeping, 1 running, 0 zombie, 1 stopped
CPU states: 9.4% user, 8.2% system, 0.0% nice, 82.2% idle
Mem: 513748K av, 470144K used, 43604K free, 0K shrd, 55788K buff
Swap: 522072K av, 8096K used, 513976K free 202040K cached


pstree: httpd-+-233*[httpd]



High Load

1:31am up 8 days, 1:55, 1 user, load average: 5.44, 5.24, 4.37
229 processes: 226 sleeping, 2 running, 1 zombie, 0 stopped
CPU states: 11.0% user, 3.9% system, 0.0% nice, 85.0% idle
Mem: 513748K av, 367656K used, 146092K free, 0K shrd, 44328K buff
Swap: 522072K av, 10656K used, 511416K free 141184K cached


pstree: httpd-+-171*[httpd]
(I have noticed as many as 244 httpd processes running during the high load condition)



One interesting thing about the numbers above, during the high load I have loads more free memory, not sure what that's all about, I'm sure its a clue.



For the last week or so I have been grabbing information about the processes running on the system, I have been storing this information in a database, earlier this evening I generated some graphs to show the load average and the number of httpd processes running on the system. http://64.21.181.70/tmp/httpprocess.php


Pete

bitserve
03-08-2002, 08:02 PM
Neat graphs. :)

I'd try doing some tcpdumps when the load is high. Maybe you're getting some type of mild DOS attack. Or maybe you'll see some traffic that will explain it.

It almost seems like it could be a syn flood against port 80, but I would expect the load averages to go higher.

Anything that slows down your IO will cause the load average to go up without the cpu usage going up. The reason why the load goes up in this circumstance is not because of shortage of CPU time, but shortage of IO availability. You have four processes waiting on IO, the load average goes up to 4.

This could also be a bad HD too, I guess. Any errors in the messages log?

patchwork
03-09-2002, 09:40 PM
Originally posted by bitserve
Neat graphs. :)

Cheers. :-)
I did some graphs last night to monitor individual processes, the page shows process count, physical memory, shared memory, %memory, %cpu, and cpu time for any process, its a cool page.


I'd try doing some tcpdumps when the load is high. Maybe you're getting some type of mild DOS attack. Or maybe you'll see some traffic that will explain it.

What do I run to get a tcp dump? netstat?

It almost seems like it could be a syn flood against port 80, but I would expect the load averages to go higher.


What's a syn flood?


Anything that slows down your IO will cause the load average to go up without the cpu usage going up. The reason why the load goes up in this circumstance is not because of shortage of CPU time, but shortage of IO availability. You have four processes waiting on IO, the load average goes up to 4.

I've monitored the IO a few times with iostat and vmstat and it all looked normal.

This could also be a bad HD too, I guess. Any errors in the messages log?

Nope nothing.



The more I look into the problem the more I suspect its a hardware issue, with my luck somebodies proberbly put the server next to a radiator or something, or maybe it has a loose board.

The network card driver is still a strong suspect as a possible cause, if tasks are having to wait to send out packets then that would proberbly cause a process backlog.



Its not done it for a couple of days now, so I'm just hoping that its sorted itself out. (maybe) (hopefully) (fingers crossed)


Pete