View Full Version : Wierd problem... NIC stops responding
chuckt101 04-06-2003, 12:47 AM Hi,
I think my NIC card stops responding every now and then.. I don't know, but this is the only conclusion I can come to w/o physical access to server.
Every few days, I won't be able to ping/tracert my server. It stops on the last hop. I have to reboot it via a remote APC port.
After it comes up, I check the logs and there is nothing weird at all. THe last thing in /var/log/messages is a server update script run as part of HSphere cron job. I'm sure this is not a problem since this script runs every few minutes and a great majority of the time, it runs fine.
I ran chkrootkit and nothing came back.
I don't believe it's an issue of server load.
Is the problem the NIC card or something else? Where can I look?
Running Redhat 7.2
Thanks
inteltechs 04-06-2003, 12:59 AM did you install any firewall scripts or snort?
Spingen 04-06-2003, 01:55 AM What kind of nic is it? There are some crappy ones that do just crap out.
mpope 04-06-2003, 06:38 AM And it could also be that your server crashed at that point. (Normally you'd see weird happenings in the logs, but I've seen it happen where no significant log entries occured....)
Anyway... I'd start with the NIC and get it replaced.... make sure they put in a 3com card this time. If the problem keeps occuring... then you know it wasn't the NIC :D
clockwork 04-06-2003, 06:42 AM I'd have someone look at console when this happens... maybe they could run 'dmesg' and see what is going on (if they are able to enter commands at all...)
THW-Dave 04-06-2003, 07:38 AM If its a inksys LTNE100TX or something, and you use module "tulip"
google stuff about "tulip_old" i had a simular problem, and it was my tulip driver that was bad.
admin0 04-06-2003, 12:26 PM Hi,
I did faced this problem before.. the nics just all of a sudden stops working, and the system absolutely is working fine.. no errors, nothing in the logs etc.
I believe that a rc.d/network restart might solve this problem, and you can try running that every x minutes, until you change your NIC to a good known one... maybe 3com ?
:homer:
chuckt101 04-06-2003, 12:35 PM How do I find out what brand the NIC is via the shell? :blush:
chuckt101 04-06-2003, 12:41 PM Blah, it happened again... I'm just going to get the dv2 guys to replace my NIC and hope that does it. :bawling:
chuckt101 04-06-2003, 12:49 PM Originally posted by inteltechs
did you install any firewall scripts or snort?
nope.
The problem just started out of the blue... That's why I thought it was a rootkit/hacker, but I checked those out and scanners said it was nothing.
I checked netstat and there are no weird connections.
admin0 04-06-2003, 12:49 PM Hi,
setup a cron job for a network restart say every 30 minutes ?
If you are on redhat/mandrake
look in the file:
/etc/sysconfig/hwconf
class: NETWORK
will show you exactly what your network card is ;)
Hope this helps
:homer:
chuckt101 04-06-2003, 12:52 PM Originally posted by admin0
Hi,
setup a cron job for a network restart say every 30 minutes ?
If you are on redhat/mandrake
look int he file:
/etc/sysconfig/hwconf
class: NETWORK
will show you exactly what your network card is ;)
Hope this helps
:homer:
Thanks for the info
If I do that, it messes up HSphere because I have 15 other IPs on the NIC and HSphere has it's own script to add IPs.... maybe I should figure out how to use it :D
I did network restart before and I had to reboot it to get HSphere to add all the IPs back on because I didn't have time to figure it out back then...
chuckt101 04-06-2003, 12:53 PM class: NETWORK
bus: PCI
detached: 0
device: eth
driver: 8139too
desc: "Realtek|RTL-8139"
vendorId: 10ec
deviceId: 8139
subVendorId: a0a0
subDeviceId: 0027
pciType: 1
-
Anybody know anything about that NIC? Like does it suck :D
I've used it for over a year no problems
admin0 04-06-2003, 01:04 PM -
class: NETWORK
bus: PCI
detached: 0
device: eth
driver: 8139too
desc: "Realtek|RTL-8139/8139C"
vendorId: 10ec
deviceId: 8139
subVendorId: 10ec
subDeviceId: 8139
pciType: 1
-
I have the same one[as stated above]. but I have no problems with this server or it's network card.
As per my past experience [we studied this for over 2 months], I have no idea or explaination on why it does not work on 1 system and works perfectly on another system of the same config and the OS.
:homer:
Spingen 04-06-2003, 01:12 PM Thats the problem :) Realtek NIC in a server.
Everything I use has Intel EtherExpress Pro (fxp) NIC's in them and have never experienced a problem with one of them.
Heres a quote for the realtek freebsd realtek driver:
The RealTek 8139 PCI NIC redefines the meaning of 'low end.' This is probably the worst PCI ethernet controller ever made
chuckt101 04-06-2003, 01:53 PM incase anyone else with HSphere needed to know.. this is how you restart the NIC and add all the IPS back:
[root@cp root]# service network restart;/hsphere/shared/scripts/setup-ips.pl
WII-Aaron 04-06-2003, 02:01 PM What type of motherboard do you have? I had this exact same problem with a server awhile ago and it turned out that that in periods of no/super low traffic the BIOS was shutting down the NIC and it wouldn't wake up on a signal.
Aaron
chuckt101 04-06-2003, 02:11 PM Originally posted by WII-Aaron
What type of motherboard do you have? I had this exact same problem with a server awhile ago and it turned out that that in periods of no/super low traffic the BIOS was shutting down the NIC and it wouldn't wake up on a signal.
Aaron
how do I tell via shell?
When you say super low, how low do you mean?
Apache averages 3 requests per second.
Current Time: Sunday, 06-Apr-2003 13:17:26 EDT
Restart Time: Sunday, 06-Apr-2003 11:29:46 EDT
Parent Server Generation: 1
Server uptime: 1 hour 47 minutes 40 seconds
Total accesses: 22309 - Total Traffic: 394.3 MB
CPU Usage: u112.86 s21.68 cu33.17 cs9.27 - 2.74% CPU load
3.45 requests/sec - 62.5 kB/second - 18.1 kB/request
13 requests currently being processed, 16 idle servers
I don't think that's the issue since it goes down all times during day and I know for fact 3 requests/sec is pretty constant on the server.
Spingen 04-06-2003, 02:13 PM At one time I did own a couple of these 8139's and yes some of them do just stop responding. In a real production enviroment why would you been using $5 nics anyways?
chuckt101 04-06-2003, 02:16 PM Originally posted by Spingen
At one time I did own a couple of these 8139's and yes some of them do just stop responding. In a real production enviroment why would you been using $5 nics anyways?
it came with the dedicated server. :stickout:
Spingen 04-06-2003, 02:43 PM Only top quality I see :laugh:
email them and say you would like a better one :)
chuckt101 04-06-2003, 05:49 PM I was doing more investigating and I found that cron logs stopped at the same time network connections stopped...
If the problem was the NIC, why would cron jobs stop? One would think they just keep running unless the problem is not the NIC and something else...
I have a cron job going every minute and it stops the same time the last log stops in /var/log/messages.
p.s. I got network restart going every 15 minutes, and so far so good....
:homer:
chuckt101 04-06-2003, 09:06 PM nooooooooooooooooooooooo... network restart didnt do it:
....
============================
Sun Apr 6 17:15:00 EDT 2003
Restart Done.
============================
Sun Apr 6 17:30:00 EDT 2003
Restart Done.
============================
Sun Apr 6 17:45:01 EDT 2003
Restart Done.
============================
Sun Apr 6 18:00:01 EDT 2003
Restart Done.
============================
Sun Apr 6 18:15:00 EDT 2003
Restart Done.
============================
Sun Apr 6 18:30:00 EDT 2003
Restart Done.
============================
Sun Apr 6 18:45:01 EDT 2003
Restart Done.
(it's now 19:40)
went down for an hour until i could get back to reboot it.
why do my processes just stop :bawling:
Spingen 04-06-2003, 10:24 PM Have you asked the provider to check the console for you? There is definately something wrong, and if you are renting a dedicated server then they should look into this for year. Maybe there is something really wrong with the hardware.
Have them check console that way you can figure out if it is the whole box crapping out.
Ive seen this before with certain nic drivers...namely eepro ones. With some of the newer kernel, usually smp ones, that nic card will halt when the server starts doing some heavy processing.
My advice, compile a custom kernel or use a up kernel for now.
clockwork 04-07-2003, 03:15 AM Wow, why not just run 'lspci'
Put down the pipe... slowly...
Slidey 04-07-2003, 07:54 AM anything in dmesg ?
eg problems when the machine boots up?
chuckt101 04-07-2003, 08:17 AM nothing weird that I can see..
The only "negative" message I get is:
mtrr: base(0xe0000000) is not aligned on a size(0x180000) boundary
whatever that means..
Mike Harris from RedHat says that's nothing though
https://listman.redhat.com/pipermail/roswell-list/2001-September/001813.html
oh well i'm waiting for a response from dv2 now...
i'll change the NIC and go from there.
|