Results 1 to 8 of 8
  1. #1
    Join Date
    Nov 2005
    Location
    Palma de Mallorca, Spain
    Posts
    259

    Help diagnosing a server 'freeze'

    Hello,

    One of the linux servers I run is just *not* stable, like freezing every 48 hours at random times, being unable to respond to pings or connect through SSH. The server comes back after a hard-reboot.

    Since other boxes I manage, a lot more busier and using the same kernel and OS are just up and running for months, I'm just wondering if I should switch hardware / server to test

    OS is CentOS 4.4, official kernel 2.6.9-42

    I see *no* kernel panics at /var/log/messages
    I see nothing weird at other logs (apache, etc) so I'm kinda lost as I don't know where to look for clues.

    Thanks for any help.

    Juan

  2. #2
    We had this problem once and did a memory swap and it started to work fine
    Nobullservers
    Hosting with out the bull
    24/7 tech support

  3. #3
    Join Date
    Nov 2005
    Location
    Palma de Mallorca, Spain
    Posts
    259
    Thanks for your input gmilazzo. I'll ask the datacenter to take a look if I can't come up with nothing else.

  4. #4
    Or you can try running a RAM test:
    http://www.memtest86.com/
    Server Surgeon George
    http://www.serversurgeon.com
    Linux, BSD and Windows Administration Services
    Toll Free US 877-378-7436 International +1-213-291-9191

  5. #5
    Join Date
    Nov 2005
    Location
    Palma de Mallorca, Spain
    Posts
    259
    DC says they performed hardware tests, RAM test, etc and told me that "everything is OK".

    Well, I got the server back online and hope some "shacking" made it some good...

  6. #6
    we are facing a very similar problem...everything looks fine in the var/log/messages...but still the server keeps hanging.

  7. #7
    Join Date
    May 2006
    Location
    Teh Interweb
    Posts
    314
    Next time it goes down, ask the datacenter techs to hook up a monitor and see what it says on the console. About 95% of the time I see this happen it is due to faulty hardware. Will they consider a chassis swap for you? This way you can eliminate all hardware issues except the drive.

    [[email protected]] ~ $ cat .signature
    cat: .signature: No such file or directory

  8. #8
    Join Date
    Nov 2006
    Location
    San Francisco
    Posts
    28
    I had similar issues and it was my BIOS - out of date. Also that same machine passed with memtest86 and failed after 2 hours with mprime. I now test new machines FOR AT LEAST 24 hours with mprime to ensure stability. Memtest test the mem. Also check system temps - high temps can freeze things up and my Asus boards (M2NPV-Vm had a heatsink hot enough to burn me - I added a fan and some ducting to keep that thing cool, and with the BIOS and good hardware its been stable for months.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •