Web Hosting Talk







View Full Version : Server crash every few hours


WWWhost
12-06-2004, 12:19 PM
hi all,
have a CentOS cPanel server that crash's every few hours with aparently no reason as there is no process that - aparently - crash the server.
But it crash's...
Any advice what could crash the server??
----------------
When it crash i can see the following on TOP
----------------
17:00:05 up 3:03, 2 users, load average: 30.87, 14.92, 7.95
327 processes: 318 sleeping, 4 running, 5 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 13.2% 0.0% 3.0% 0.3% 0.1% 83.1% 0.0%
Mem: 965920k av, 955744k used, 10176k free, 0k shrd, 121320k buff
707720k actv, 172816k in_d, 4284k in_c
Swap: 1959920k av, 157008k used, 1802912k free 425068k cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
27680 root 15 0 20352 18M 1964 S 1.5 2.0 0:08 0 /usr/local/apache/bin/httpd -DSSL
8168 root 15 0 3532 3532 2536 S 1.0 0.3 0:00 0 /usr/sbin/exim -Mc 1CbLGu-00026v-8b
8196 root 16 0 3592 3592 2596 S 1.0 0.3 0:00 0 /usr/sbin/exim -Mc 1CbLH6-000270-HA
8183 root 16 0 3520 3520 2520 S 0.8 0.3 0:00 0 /usr/sbin/exim -Mc 1CbLGu-00027A-51
8194 root 16 0 3588 3588 2596 S 0.8 0.3 0:00 0 /usr/sbin/exim -Mc 1CbLGz-00026r-B6
3797 mailnull 15 0 444 168 88 S 0.6 0.0 0:04 0 /usr/sbin/exim -bd -q120m
24504 root 15 0 1236 1236 680 R 0.6 0.1 0:05 0 top
8169 root 15 0 3496 3496 2504 S 0.6 0.3 0:00 0 /usr/sbin/exim -Mc 1CbLGt-00026q-MK
8192 cpanel 18 0 4056 4056 1792 D 0.6 0.4 0:00 0 /usr/local/cpanel/3rdparty/bin/php /usr/local/cpanel/base/
8165 root 15 0 3496 3496 2504 S 0.5 0.3 0:00 0 /usr/sbin/exim -Mc 1CbLGq-00026e-Ms
8175 root 15 0 3528 3528 2540 S 0.5 0.3 0:00 0 /usr/sbin/exim -Mc 1CbLGy-000278-2W
8176 root 18 0 3544 3544 2540 D 0.5 0.3 0:00 0 /usr/sbin/exim -Mc 1CbLGx-00026x-Pt
8195 root 16 0 3576 3576 2584 D 0.5 0.3 0:00 0 /usr/sbin/exim -Mc 1CbLGo-00025L-Ig
4 root 15 0 0 0 0 DW 0.3 0.0 0:01 0 kswapd
3891 mysql 15 0 50140 28M 1452 S 0.3 3.0 0:41 0 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql --us
7160 someuser 15 0 3900 2876 852 R 0.3 0.2 0:00 0 cppop - serving 82.90.237.121 - TRANSACTION - someuser
8191 cpanel 15 0 4232 2584 944 S 0.3 0.2 0:00 0 cpaneld - serving 000.000.00.000
12 root 15 0 0 0 0 DW 0.1 0.0 0:14 0 kjournald
3715 named 25 0 17808 12M 892 S 0.1 1.3 0:31 0 /usr/sbin/named -u named
8134 root 15 0 3744 3744 2492 S 0.1 0.3 0:00 0 /usr/sbin/exim -Mc 1CbLGf-00025N-Eh
8210 djcasta 21 0 1788 1788 1276 D 0.1 0.1 0:00 0 /usr/local/cpanel/bin/autorespond info@domain.com /home/d
1 root 15 0 108 76 56 S 0.0 0.0 0:04 0 init
2 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 keventd
3 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd/0
6 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 bdflush
5 root 15 0 0 0 0 SW 0.0 0.0 0:02 0 kscand
7 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kupdated
8 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 mdrecoveryd
68 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 khubd
2394 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kjournald
2395 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 kjournald
3667 root 15 0 236 208 156 D 0.0 0.0 0:02 0 syslogd -m 0

AhmedFouad
12-06-2004, 01:24 PM
From Top i can see that you have many intances from exim , are you sure that these emails are legal ?

I suggest adding exim extend logging options to analyze it.

After first fixing this and get the normal loads , see the memory usage as it seems now your are using ALL the physical memory.

Captian_Spike
12-06-2004, 02:19 PM
I think your problem is ram related. You have all your ram used up and swap is being used. This is causing IOWait to take your cpu load through the roof. You should first optimized httpd and mysql memory usage (I could help you out there if want to pm or email me), also you should start disabling uneeded processes, and as mentioned look for spammers, I don't think there should be more then one exim process ussually. Next make sure your using the lastest kernel (20.0.1-EL), the new kernel fixes the iowait issues allowing swap usage without the insane load average :D

Finally I would advise adding more ram, although its not absolutly neccessary, you would probally want to do it before adding more clients to the server. Again, I have run a server with 300MB swap usage and it still has great performance, its sort of situation dependent.

Steven
12-06-2004, 02:23 PM
I would almost bet, hes got a spammer problem.

Pheaton
12-06-2004, 03:30 PM
Yup, definately looks like a spammer problem.

goldenplanet
12-07-2004, 04:38 AM
While I agree on the theory regarding a spammer runing loose on the server there is one other option that you should look into: If you look at the output from top, it's clear that the load isn't caused by lack of RAM or CPU power but rather from the I/O system being run into the ground - the iowait is 83.1% which is way beyond acceptable since there is no significant swapping taking place.

This may be caused by extreme amounts of disk access (e.g. writing logs) or waiting for extreme amounts of network connections being initiated and stopped again (e.g. connecting to remote mailservers). However, if the exim threads are in fact legitimate and there is no signs of large amounts of traffic, it may be caused by a defective NIC or network equipment (packet loss and retransmissions), or by a hard drive that is about to go bust (reallocating data to undamaged areas of the drive). This may also explain the crashes you experience, although the load alone might be enough to do it... :)

Like I said - the spammer theory is the most likely but if you fail to find him, these are the other options you might want to consider.

Captian_Spike
12-07-2004, 05:21 AM
The spammer theory is good. I mentioned ram though because older kernels in CentOS had an issue with swapspace and IOWait. I experienced it first hand. When we ran out of physical ram and started swapping (even if it was only 30MB of swap) it would cause server loads in the high 50's and even crashed the entire server once. After updating the kernel it ran fine with swap space.

goldenplanet
12-07-2004, 05:35 AM
Ah - OK, the memory/swap problem in CentOS was new to me. Could very well be the cause of the problem if a spammer isn't found then.

WWWhost
12-07-2004, 06:53 AM
hello,
i have currently the following kernel installed:
---
Linux servername.domain.com 2.4.21-9.0.1.EL.c0 #1 Sat Mar 6 08:10:10 GMT 2004 i686 i686 i386 GNU/Linux
---

not sure if this is the last (or the kernel with this problems you mentioned)

However the server goes dow mostly the night when the load is usually 0.05 -0.50
that ' is the problem... so ther shoul really somehting wrong and probably not related to serving accounts.
I contact some support service i found here to look into that problem but did not get a replay yet.

McRox
12-07-2004, 12:19 PM
Originally posted by WWWhost
hello,
i have currently the following kernel installed:
---
Linux servername.domain.com 2.4.21-9.0.1.EL.c0 #1 Sat Mar 6 08:10:10 GMT 2004 i686 i686 i386 GNU/Linux
---

not sure if this is the last (or the kernel with this problems you mentioned)

Seems you've never upgraded the kernel on that box, since its a 9 month old kernel. Upgrade to this kernel asap to avoid your system being compromised, if didnt happen already!!

http://mirror.aelix.com/pub/cAos/centos-3/3.3/updates/i386/RPMS/kernel-2.4.21-20.0.1.EL.i686.rpm

Originally posted by WWWhost
However the server goes dow mostly the night when the load is usually 0.05 -0.50
that ' is the problem... so ther shoul really somehting wrong and probably not related to serving accounts.
I contact some support service i found here to look into that problem but did not get a replay yet.

Do you have a special (raid) controller installed on the server? Also, what kind of motherboard do you have?

matt2kjones
12-07-2004, 01:19 PM
if it goes down during low loads then it could be a hardware problem.

Ryan F
12-11-2004, 06:05 AM
I installed cPanel on a clean CentOS 3.3 box (Dell 1750 w/Perc) and the thing has gone down twice in the last two days. The problem seems to be kernel panic related.

There are no sites on the server.

WWWhost
12-11-2004, 12:05 PM
yes, afhter upgrade to latest kernel this issue seems to be solved...

WWWhost
12-14-2004, 02:09 PM
wrong.... still crashs...incredible....

assuredhost.com
12-14-2004, 04:24 PM
type /scripts/exim4 , it will install new version of exim on your server. May be the problem will be fixed with that.

WWWhost
12-14-2004, 06:41 PM
assuredhost.com --- thanks but i already installed it twice.

NetHosted-Andrew
12-14-2004, 07:04 PM
Originally posted by matt2kjones
if it goes down during low loads then it could be a hardware problem.

It's starting to look that way to me as well now.

Andrew

WWWhost
12-15-2004, 05:29 AM
btw: i saw that this crash's are nearly the same time every day (could be diff. about 30- 40 min) Looking to the cron i see that the following cron are running befor the crash:
--------
/usr/local/cpanel/bin/dcpumon >/dev/null 2>&1
--------
What cron is this and what does it do?

Thanks a lot

WWWhost
12-15-2004, 05:32 AM
ops there is alos the following script running with cron:
---------
perl /root/rvadmin/rvmultiupdate.pl >/dev/null 2>&1
---------

thi is probably fro rvskin.

WWWhost
12-16-2004, 09:32 AM
i can add the following while and afther it crash's
-----
[Thu Dec 16 13:06:24 2004] [warn] child process 25580 still did not exit, sending a SIGTERM
[Thu Dec 16 13:06:28 2004] [warn] child process 25581 still did not exit, sending a SIGTERM
[Thu Dec 16 13:06:29 2004] [warn] child process 3103 still did not exit, sending a SIGTERM
[Thu Dec 16 13:06:29 2004] [warn] child process 25582 still did not exit, sending a SIGTERM
[Thu Dec 16 13:06:29 2004] [warn] child process 25583 still did not exit, sending a SIGTERM
[Thu Dec 16 13:06:29 2004] [warn] child process 25584 still did not exit, sending a SIGTERM
[Thu Dec 16 13:06:29 2004] [warn] child process 25585 still did not exit, sending a SIGTERM
[Thu Dec 16 13:06:29 2004] [warn] child process 3108 still did not exit, sending a SIGTERM
[Thu Dec 16 13:06:29 2004] [warn] child process 25586 still did not exit, sending a SIGTERM
[Thu Dec 16 13:06:29 2004] [warn] child process 25587 still did not exit, sending a SIGTERM
[Thu Dec 16 13:06:29 2004] [warn] child process 25588 still did not exit, sending a SIGTERM
[Thu Dec 16 13:06:29 2004] [warn] child process 25596 still did not exit, sending a SIGTERM
[Thu Dec 16 13:06:29 2004] [warn] child process 25606 still did not exit, sending a SIGTERM
and so on...


afhter reboot

Thu Dec 16 13:08:08 2004] [warn] VirtualHost 00.00.00.00:80 overlaps with VirtualHost 00.00.00.00:80, the first has precedenc$
[Thu Dec 16 13:08:08 2004] [warn] VirtualHost 00.00.00.00:80 overlaps with VirtualHost 00.00.00.00:80, the first has precedenc$
[Thu Dec 16 13:08:08 2004] [warn] VirtualHost 00.00.00.00:80 overlaps with VirtualHost 00.00.00.00:80, the first has precedenc$
[Thu Dec 16 13:08:09 2004] [error] PHP Warning: mysql_connect(): Can't connect to local MySQL server through socket '/var/t$
[Thu Dec 16 13:08:09 2004] [error] PHP Warning: mysql_select_db(): Can't connect to local MySQL server through socket '/var$
[Thu Dec 16 13:08:09 2004] [error] PHP Warning: mysql_select_db(): A link to the server could not be established in /home/a$
[Thu Dec 16 13:08:09 2004] [notice] Apache configured -- resuming normal operations
[Thu Dec 16 13:08:09 2004] [notice] suEXEC mechanism enabled (wrapper: /usr/local/apache/bin/suexec)
[Thu Dec 16 13:08:09 2004] [notice] Accept mutex: sysvsem (Default: sysvsem)
[Thu Dec 16 13:08:10 2004] [error] PHP Warning: mysql_connect(): Can't connect to local MySQL server through socket '/var/t$
[Thu Dec 16 13:08:10 2004] [error] PHP Warning: mysql_select_db(): Can't connect to local MySQL server through socket '/var$
[Thu Dec 16 13:08:10 2004] [error] PHP Warning: mysql_select_db(): A link to the server could not be established in /home/a$
[Thu Dec 16 13:08:15 2004] [error] PHP Warning: mysql_query(): Can't connect to local MySQL server through socket '/var/tmp$
[Thu Dec 16 13:08:15 2004] [error] PHP Warning: mysql_query(): A link to the server could not be established in /home/altre$
[Thu Dec 16 13:08:15 2004] [error] PHP Warning: mysql_connect(): Can't connect to local MySQL server through socket '/var/t$
[Thu Dec 16 13:08:15 2004] [error] [client 83.176.24.168] File does not exist: /home/someuser/public_html/favicon.ico
[Thu Dec 16 13:08:15 2004] [error] PHP Warning: mysql_query(): Can't connect to local MySQL server through socket '/var/tmp$
[Thu Dec 16 13:08:15 2004] [error] PHP Warning: mysql_query(): A link to the server could not be established in /home/altre$
[Thu Dec 16 13:08:15 2004] [error] PHP Warning: mysql_connect(): Can't connect to local MySQL server through socket '/var/t$
[Thu Dec 16 13:08:22 2004] [notice] caught SIGTERM, shutting down
[Thu Dec 16 13:10:13 2004] [warn] VirtualHost 00.00.00.00:80 overlaps with VirtualHost 00.00.00.00:80, the first has precedenc$
[Thu Dec 16 13:10:13 2004] [warn] VirtualHost 00.00.00.00:80 overlaps with VirtualHost 00.00.00.00:80, the first has precedenc$
[Thu Dec 16 13:10:13 2004] [warn] VirtualHost 00.00.00.00:80 overlaps with VirtualHost 00.00.00.00:80, the first has precedenc$
[Thu Dec 16 13:10:13 2004] [warn] VirtualHost 00.00.00.00:80 overlaps with VirtualHost 00.00.00.00:80, the first has precedenc$
[Thu Dec 16 13:10:13 2004] [warn] VirtualHost 00.00.00.00:80 overlaps with VirtualHost 00.00.00.00:80, the first has precedenc$
[Thu Dec 16 13:10:13 2004] [warn] VirtualHost 00.00.00.00:80 overlaps with VirtualHost 00.00.00.00:80, the first has precedenc$
[Thu Dec 16 13:10:13 2004] [crit] (98)Address already in use: make_sock: could not bind to port 443
[Thu Dec 16 13:10:14 2004] [notice] Apache configured -- resuming normal operations

andreyka
12-16-2004, 09:49 AM
I know this - it is ssl bug.

WWWhost
12-16-2004, 01:19 PM
hi, can you tell me how to solve this problem?
i still have this when the server goes crash when i in TOP
----------
18:00:30 up 51 min, 2 users, load average: 53.66, 21.19, 9.21
251 processes: 246 sleeping, 4 running, 1 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 27.7% 0.0% 44.1% 0.5% 1.3% 26.1% 0.0%
Mem: 962848k av, 924852k used, 37996k free, 0k shrd, 97340k buff
690336k actv, 136612k in_d, 14336k in_c
Swap: 1959920k av, 125344k used, 1834576k free 339988k cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
2002 root 15 0 18396 176 104 S 3.9 0.0 1:59 0 /usr/local/apache/bin/httpd -DSSL
13948 username 19 0 5140 5140 1592 D 2.5 0.5 0:00 0 /usr/bin/perl -T /usr/local/cpanel/base/neomail/neomail.pl
13950 root 15 0 3692 3692 2696 S 0.7 0.3 0:00 0 /usr/sbin/exim -Mc 1CeyyA-0001Pi-8g
2045 mysql 15 0 28316 16M 1408 S 0.3 1.8 0:10 0 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql --us
8338 root 15 0 1124 1124 580 R 0.3 0.1 0:03 0 top
5758 username 16 0 4296 2356 1032 S 0.3 0.2 0:00 0 cppop - serving 80.181.21.250 - AUTHORIZATION
9993 username 17 0 5140 5140 1592 D 0.3 0.5 0:00 0 /usr/bin/perl -T /usr/local/cpanel/base/neomail/neomail.pl
5699 nobody 15 0 0 0 0 Z 0.1 0.0 0:00 0 httpd <defunct>
1 root 15 0 128 80 56 S 0.0 0.0 0:04 0 init
2 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 keventd
3 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd/0
6 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 bdflush
4 root 15 0 0 0 0 SW 0.0 0.0 0:01 0 kswapd
5 root 15 0 0 0 0 SW 0.0 0.0 0:01 0 kscand
7 root 15 0 0 0 0 RW 0.0 0.0 0:00 0 kupdated
8 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 mdrecoveryd
12 root 15 0 0 0 0 DW 0.0 0.0 0:09 0 kjournald
104 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 khubd
576 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 kjournald
577 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 kjournald
579 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 loop0
580 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kjournald
1837 root 15 0 252 228 164 D 0.0 0.0 0:01 0 syslogd -m 0
1841 root 15 0 72 4 0 S 0.0 0.0 0:00 0 klogd -x
1886 named 25 0 11340 9.9M 744 S 0.0 1.0 0:14 0 /usr/sbin/named -u named
1918 root 19 0 1336 188 40 S 0.0 0.0 0:00 0 chkservd
2015 root 25 0 152 4 0 S 0.0 0.0 0:00 0 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --pi
2507 root 15 0 3428 748 344 S 0.0 0.0 0:02 0 cppop -
8 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 mdrecoveryd
12 root 15 0 0 0 0 DW 0.0 0.0 0:09 0 kjournald
104 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 khubd
576 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 kjournald
577 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 kjournald
---------

Thanks a lot