Results 1 to 16 of 16
  1. #1
    Join Date
    Oct 2000
    Location
    Canada
    Posts
    115

    What causes my entire server to go down?

    Up to the last month or so, my server with Servermatrix has been rock solid, with no downtime. For some reason, though, I've had every service on the server fail three times in the last month, and the only way to bring everything back online is to reboot the server.

    There doesn't seem to be any other warning signs before this happens - CPU usage doesn't spike, and hard drive space usage is just fine.

    Can anyone give me some pointers on where I should be looking to try and figure out what's causing this server-wide crash? It's running Red Hat Enterprise with Cpanel and WHM.

    Thanks!

    Neil

  2. #2
    Join Date
    Dec 2002
    Location
    UK
    Posts
    838
    I had a similar problem with my server once working all fine then in a span period of a couple of weeks it used to go down for no reason at all nearly or if not everyday

    every trick in the book was tried but still it kept going doiwn and like urs we needed a reboot to get it back up...it wud come back up then die 24 hrs later or on so...

    possible causes we found out where hardware..the power supply was the main cause

    u may wanna get ur DC to check the power supply or maybe a memtest even..

    this was done in our case

  3. #3
    Is memory usage a concern?

  4. #4
    Join Date
    Oct 2000
    Location
    Canada
    Posts
    115
    I was wondering if it could be a hardware issue. Memory, I believe, is okay.

    I was a big idiot, though, and accidentally had my IPAlert monitoring email as an email on the server. So all of the server warnings about the problem weren't being sent.

    Whoops.

  5. #5
    Join Date
    Jan 2003
    Posts
    1,715
    You can usually see memory usage problems coming. If it happens repeatedly and without warning, power supply and cooling are good first candidates.
    Game Servers are the next hot market!
    Slim margins, heavy support, fickle customers, and moronic suppliers!
    Start your own today!

  6. #6
    Join Date
    Mar 2003
    Location
    Edmonton, AB Canada
    Posts
    884
    you can check your logs to see if theres any signs of anything
    also if you think its the memory, run memory tests and see what happens....
    Ben S.

  7. #7
    Join Date
    May 2004
    Location
    Lithuania
    Posts
    1,039
    Or try tu setup ie. nagios(.org). You will be one step ahead

  8. #8
    Join Date
    Dec 2004
    Location
    Ontario, Canada
    Posts
    8
    Log checking would be a good idea

  9. #9
    Join Date
    Dec 2002
    Location
    UK
    Posts
    838
    Just curious what DC is ur server located in?

    ((btw what log do u check to see for any signs of anything))

  10. #10
    Join Date
    Dec 2001
    Location
    Colombia
    Posts
    27

    I´m having the same problem

    I,m having the same problem..... but yesterday I noticed that the server rans out of memory.


    16:56:51 up 2 days, 48 min, 2 users, load average: 62.04, 43.08, 21.47
    135 processes: 74 sleeping, 58 running, 2 zombie, 1 stopped
    CPU states: cpu user nice system irq softirq iowait idle
    total 0.0% 0.0% 99.8% 0.0% 0.0% 0.0% 0.0%
    Mem: 1022480k av, 1014520k used, 7960k free, 0k shrd, 10828k buff
    979544k active, 3568k inactive
    Swap: 2097136k av, 2097136k used, 0k free 9944k cached

  11. #11
    Join Date
    Feb 2004
    Posts
    219
    1. Memory test to be done immeditely.
    2. Hard disk to be checked .
    3. Kernal Hanging will give the above troubles. Update your software to recent stable versions.
    4. There is some faulty switch . It takes full power with in few seconds and sucks the server power supply.
    5. You utilised more then 80% of your space.

    These are the possible reasons.

    [ Thanks to BurstSalman , BurstMike , BurstJenny ]

  12. #12
    Join Date
    Feb 2003
    Location
    San Jose, California
    Posts
    411
    We found that servers which go down for no explainable reason, can be attributed to defective hardware most of the time. It's either bad memory or power supply related in many cases. Sometimes just loose cables and/or DIMMS, or improper cooling will cause these problems.

    With issues such as this, we usually try removing the hard drive(s) from the server and putting it into a new box. About 80% of the time this fixes it.

  13. #13
    Join Date
    Dec 2002
    Location
    UK
    Posts
    838
    I got this in my logwatch email

    anyone know what it means?

    WARNING: Kernel Errors Present
    hda: drive_cmd: error=0x04 { DriveStat...: 1Time(s)
    hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }...: 1Time(s)


    as my server keeps crashing for some reason do u think that above has something to do with it?

  14. #14
    Join Date
    Jan 2003
    Posts
    1,715
    Unless you see recovery messages (which you clearly don't), it means your hard drive is about to puke.
    Game Servers are the next hot market!
    Slim margins, heavy support, fickle customers, and moronic suppliers!
    Start your own today!

  15. #15
    Join Date
    Dec 2002
    Location
    UK
    Posts
    838
    Originally posted by hiryuu
    Unless you see recovery messages (which you clearly don't), it means your hard drive is about to puke.
    Oh dear hope not

    well apaprently i emailed my DS provider support desk they said its nothing that i shud worry about and they actually reffered me to this thread

    http://www.webhostingtalk.com/archiv.../268788-1.html

  16. #16
    power supply is possibly the issue.
    Would be the best to contact your DC and perform hardware test localy to find out what makes the problems.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •