
|
View Full Version : server crashing
AlaskanWolf 01-19-2002, 01:43 PM I got a server that has a fresh install of RH 7.2 and cpanel, before it left me to go to the NOC, it ran great, ran alot of tests, load tests etc
Got it to the NOC, now the server is failing about every hour, just stopping, nothing to show in the /var/log/messages file at the time it fails
Had NOC put a monitor on it and he said "i see 'ip changes already exist' and the server seems to be under very heavy load"
theres no customers or data on this machine to cause heavy load
Suggestions? does it sound like the kernel? I switched back and forth from 2.4.7 and 2.4.9 (current kernels on this machine) and either make much difference
You can try upgrading.
Also a hard drive is a common problem. If you got an IDE hard drive it some times causes high loads with no customers on it.
Synergy 01-19-2002, 04:36 PM Sometimes because of a faulty HD, it caches too much and the load becomes extremely high
AlaskanWolf 01-19-2002, 10:12 PM FYI: I will be setting up a cron that catches the top every few minutes so that i can see whats going on right before it crashes
I have almost the same problem on an existing machine, where it just freezes and that machine has the latest kernel, but it seems to be less sporatic in the crashes (once every other day) again, with top, theres no load at all on that machine either
AlaskanWolf 01-20-2002, 01:05 AM was just about to download the latest and greatest when the server stopped again, and i founds this in the logs
Jan 19 20:41:54 greyfox sshd(pam_unix)[5110]: session opened for user root by (uid=0)
Jan 19 20:42:57 greyfox proftpd[5092]: greyfox.thehideout.net (localhost[127.0.0.1]) - FTP login timed out, disconnected.
Jan 19 20:51:17 greyfox proftpd[5178]: greyfox.thehideout.net (localhost[127.0.0.1]) - FTP login timed out, disconnected.
Jan 19 20:53:28 greyfox kernel: eepro100: wait_for_cmd_done timeout!
Jan 19 20:53:28 greyfox kernel: eepro100: wait_for_cmd_done timeout!
Jan 19 20:53:29 greyfox rhnsd[1428]: Exiting
Jan 19 20:53:29 greyfox rhnsd: rhnsd shutdown succeeded
AlaskanWolf 01-21-2002, 12:39 AM It looks like the driver eepro100 which runs the onboard nic is to blame, if you do a search for eepro100: wait_for_cmd_done timeout on google, you will find ends of ends of pages full of the exact problem we have, sproratic outages that can be from 1 minute to 24 hours apart.
Robot Two 01-21-2002, 12:57 AM If it was working when it went out the door, most likely your ethernet card got knocked a little loose in transit, or physically damaged. Try having them re-seat the card, and if that doesnt work, have them replace it. Those cards shouldnt be more than $20-$30
-Dan
AlaskanWolf 01-21-2002, 01:01 AM Hi Dan
nevermind
AlaskanWolf 01-21-2002, 01:02 AM Originally posted by Robot Two
If it was working when it went out the door, most likely your ethernet card got knocked a little loose in transit, or physically damaged. Try having them re-seat the card, and if that doesnt work, have them replace it. Those cards shouldnt be more than $20-$30
-Dan
Its a onboard nic card :(
its under warrenty and my other server running a non-intel mobo works just fine (basically same configuration, just different mobos)
I already contacted the company i got the server from and instructed them to replace the mobo with the idential of the other server we got from them
Funny how it was online today for 3 hours, as soon as i put traffic into it (ie: went to whm area) it froze...so i hereby give up :)
Gernot 01-21-2002, 02:25 PM That's really a common error. Replace the eepro100 driver by e100 (which is from Intel themselves). This driver is almost perfect and should solve all problems. You can get one at support.intel.com (search for 'e100').
Thanks,
Gernot
|