pmak0
02-23-2002, 05:25 AM
My server crashed/froze for no apparent reason.
How can I determine the reason for the crash? I looked through /var/log/messages after the server was rebooted but I just saw messages from while the machine was rebooting, and messages from well before the crash.
AlaskanWolf
02-23-2002, 05:30 AM
whats the specs on it...mobo..nic card...os...etc
pmak0
02-23-2002, 05:33 AM
> whats the specs on it...mobo..nic card...os...etc
It is an AMD Athlon 1GHz (64 KB cache) with 512 MB RAM and 28 GB disk running Red Hat Linux 7.2.
I'm not sure how to determine the motherboard and NIC card. How would I determine those?
(I'm glad that it's Red Hat Linux 7.2 instead of 7.1, since that means I have the ext3 Journaling File System.)
Jedito
02-23-2002, 05:53 AM
Check /var/log/message what was are the latest message before the crash.
pmak0
02-23-2002, 05:59 AM
I'm attaching the part of /var/log/messages from around the crash. I believe that this is the relevant part:
Feb 21 16:02:08 lina login(pam_unix)[685]: session closed for user cs161
Feb 21 08:50:40 lina syslogd 1.4.1: restart.
Feb 21 08:50:41 lina syslog: syslogd startup succeeded
The first line shows someone logging out. Then the second line is from the beginning of the reboot (which actually happened around Feb 21 18:50:00 according to "uptime" but my machine's clock messes itself up every time it reboots).
zupanm
02-23-2002, 10:16 AM
random reboots could be due to a lot of things. The big one i've seen is bad RAM another one which happened to me was a bad ethernet card. I don't know why that caused a reboot but it did.
Walter
02-23-2002, 11:39 AM
IMHO the top reason for random reboots is bad ram and temperature too high.
pmak0
02-23-2002, 11:40 AM
> IMHO the top reason for random reboots is bad ram and
> temperature too high.
How do I check the temperature of the machine on Red Hat Linux 7.2 (or is it even possible)?
I know that on a SunOS box I can type "lom" as root and get access to the internal server sensors, but I'm not sure about on Linux.
zupanm
02-23-2002, 03:48 PM
there is a kernel module for checking cpu temp. Although last time i knew that was only for 2.2. I'm not sure if there are any for 2.4 yet. There might be.. its been a long time since i checked
bitserve
02-23-2002, 08:15 PM
I guess you could search for core files that might have been created.
Plus any errors even no where near the time it crashed might at least indicate a problem.
Good luck.