Web Hosting Talk







View Full Version : Figuring Out A Crash...


mainarea
10-21-2003, 05:42 PM
I tried to access a test server at ******* today, but it was down. I got it rebooted, but cannot figure out why the server crashed (it was pingable, but no services were available). All that's in /var/log/messages is the following:

Oct 20 13:41:02 wow sshd(pam_unix)[27988]: session closed for user root
Oct 21 14:16:44 wow syslogd 1.4.1: restart.

Nothing suspicious there. I can't find any info in other logs either - where else do you suggest I look?

- Matt

mainarea
10-21-2003, 05:45 PM
By the way, just found this in the Apache error log:

[Tue Oct 21 00:11:07 2003] [warn] child process 18747 still did not exit, sending a SIGTERM
[Tue Oct 21 00:11:15 2003] [warn] child process 18748 still did not exit, sending a SIGTERM
[Tue Oct 21 00:11:22 2003] [warn] child process 18749 still did not exit, sending a SIGTERM
[Tue Oct 21 00:11:34 2003] [warn] child process 18750 still did not exit, sending a SIGTERM
[Tue Oct 21 00:11:43 2003] [warn] child process 18751 still did not exit, sending a SIGTERM
[Tue Oct 21 00:11:48 2003] [warn] child process 18752 still did not exit, sending a SIGTERM
[Tue Oct 21 00:11:53 2003] [warn] child process 18753 still did not exit, sending a SIGTERM
[Tue Oct 21 00:11:59 2003] [warn] child process 18754 still did not exit, sending a SIGTERM

Could this be the reason?

- Matt

BurtonHost
10-21-2003, 08:46 PM
Matt,

Is the test server running any load, or fairly dead except from when you access?

Was the server unusuable between 13:41 and 14:16?

Joshua
10-21-2003, 09:38 PM
The test server isn't running any load - Only DirectAdmin is loaded on the server, with one account setup, and one static file in the account (And no traffic to that file). The server has only transfered 170MB of data since the 7th... The server was unusable during that period, but still pingable - Not one single service could be accessed during that time.

-Josh

mainarea
10-21-2003, 09:40 PM
Just to clarify - I'm not sure exactly when the server went down since I'm not monitoring it, but it was still working for at least an hour after 13:41 on the 20th (the last time I checked). I have a slight feeling that the SIGTERMs have something to do with the crash, but can't be sure - anywhere else to check, or any suggestions on preventing this in the future? It's a DirectAdmin box.

- Matt

stevenblazer
10-21-2003, 09:44 PM
Did you call *******?? I dont see anything wrong from what you have post....

Joshua
10-21-2003, 09:57 PM
Servers with nothing running on them don't usually shut off all services but still respond to pings for no reason. We haven't talked with ******* about the possibility of a hardware issue yet, we're investigating the software side first (this server is not managed by them).

-Josh

Joshua
10-21-2003, 09:59 PM
Note - This was also an issue with Bahawolf's server there, see http://www.webhostingtalk.com/showthread.php?s=&threadid=198884 .

piramida
10-22-2003, 05:05 AM
Could this be the reason?


no, it's a consequence.

coight
10-22-2003, 06:35 AM
When you tried sshing in what happened?

mainarea
10-22-2003, 07:40 AM
SSH wouldn't connect, and it quit after 30 seconds of trying. Nothing else was available (ftp, DA, bind, apache) besides ping

inteltechs
10-22-2003, 06:26 PM
Originally posted by mainarea
SSH wouldn't connect, and it quit after 30 seconds of trying. Nothing else was available (ftp, DA, bind, apache) besides ping

maybe a RAM swap is needed.

Kevin

MattF
10-22-2003, 07:10 PM
Sounds more like an IO problem, where sockets and resident memory programs stay running, however since a SSH attempt logs to disk it also jams up.

There is an interesting thread on this a rackshack, it appears no hardware dependent and could be numerous issue from incorrect quotas, dma settings, bad software, bad kernel etc..

Vladimir S.
10-22-2003, 07:33 PM
I suggest you to test ram.