Web Hosting Talk







View Full Version : server crashing


AlaskanWolf
01-19-2002, 01:43 PM
I got a server that has a fresh install of RH 7.2 and cpanel, before it left me to go to the NOC, it ran great, ran alot of tests, load tests etc

Got it to the NOC, now the server is failing about every hour, just stopping, nothing to show in the /var/log/messages file at the time it fails

Had NOC put a monitor on it and he said "i see 'ip changes already exist' and the server seems to be under very heavy load"

theres no customers or data on this machine to cause heavy load

Suggestions? does it sound like the kernel? I switched back and forth from 2.4.7 and 2.4.9 (current kernels on this machine) and either make much difference

Palm
01-19-2002, 03:52 PM
You can try upgrading.

Also a hard drive is a common problem. If you got an IDE hard drive it some times causes high loads with no customers on it.

Synergy
01-19-2002, 04:36 PM
Sometimes because of a faulty HD, it caches too much and the load becomes extremely high

AlaskanWolf
01-19-2002, 10:12 PM
FYI: I will be setting up a cron that catches the top every few minutes so that i can see whats going on right before it crashes

I have almost the same problem on an existing machine, where it just freezes and that machine has the latest kernel, but it seems to be less sporatic in the crashes (once every other day) again, with top, theres no load at all on that machine either

AlaskanWolf
01-20-2002, 01:05 AM
was just about to download the latest and greatest when the server stopped again, and i founds this in the logs

Jan 19 20:41:54 greyfox sshd(pam_unix)[5110]: session opened for user root by (uid=0)
Jan 19 20:42:57 greyfox proftpd[5092]: greyfox.thehideout.net (localhost[127.0.0.1]) - FTP login timed out, disconnected.
Jan 19 20:51:17 greyfox proftpd[5178]: greyfox.thehideout.net (localhost[127.0.0.1]) - FTP login timed out, disconnected.
Jan 19 20:53:28 greyfox kernel: eepro100: wait_for_cmd_done timeout!
Jan 19 20:53:28 greyfox kernel: eepro100: wait_for_cmd_done timeout!
Jan 19 20:53:29 greyfox rhnsd[1428]: Exiting
Jan 19 20:53:29 greyfox rhnsd: rhnsd shutdown succeeded

AlaskanWolf
01-21-2002, 12:39 AM
It looks like the driver eepro100 which runs the onboard nic is to blame, if you do a search for eepro100: wait_for_cmd_done timeout on google, you will find ends of ends of pages full of the exact problem we have, sproratic outages that can be from 1 minute to 24 hours apart.

Robot Two
01-21-2002, 12:57 AM
If it was working when it went out the door, most likely your ethernet card got knocked a little loose in transit, or physically damaged. Try having them re-seat the card, and if that doesnt work, have them replace it. Those cards shouldnt be more than $20-$30

-Dan

AlaskanWolf
01-21-2002, 01:01 AM
Hi Dan

nevermind

AlaskanWolf
01-21-2002, 01:02 AM
Originally posted by Robot Two
If it was working when it went out the door, most likely your ethernet card got knocked a little loose in transit, or physically damaged. Try having them re-seat the card, and if that doesnt work, have them replace it. Those cards shouldnt be more than $20-$30

-Dan


Its a onboard nic card :(

its under warrenty and my other server running a non-intel mobo works just fine (basically same configuration, just different mobos)

I already contacted the company i got the server from and instructed them to replace the mobo with the idential of the other server we got from them

Funny how it was online today for 3 hours, as soon as i put traffic into it (ie: went to whm area) it froze...so i hereby give up :)

Gernot
01-21-2002, 02:25 PM
That's really a common error. Replace the eepro100 driver by e100 (which is from Intel themselves). This driver is almost perfect and should solve all problems. You can get one at support.intel.com (search for 'e100').

Thanks,
Gernot