Web Hosting Talk







View Full Version : Server Crashing...


Haze
04-18-2002, 06:41 PM
We have a re-occurring issue on one of our servers, where it crashes every 2 or 3 days. Other than that, it runs fine. Would any of you have any idea what the attached error message means?

DanielP
04-18-2002, 07:09 PM
My best guess would be bad ram, or a bad mboard, but somewhere around in those areas.

Haze
04-18-2002, 07:13 PM
Thats what I'm thinking :( We have already made one move because of hardware issues. I hope this isn't a continuous thing.

The Prohacker
04-18-2002, 07:36 PM
Originally posted by Haze
Thats what I'm thinking :( We have already made one move because of hardware issues. I hope this isn't a continuous thing.

Change the ram and possibly the cpu, if that doesn't help, change the mobo and power supply.... It can really be several issues, but does sound hardware, and could be any of the 4....

You'd have to run certain testing software and benchmarking software to find out exactly where the problem lies...

Haze
04-19-2002, 08:52 PM
They checked the cpu and ram and say its fine. I had to open another ticket to get them to look further.. hopefully they will :/

Does anyone have any suggestions, if by chance they say its not hardware related?

billyjoe
04-19-2002, 08:56 PM
One thing I'd look at is the swap partition. Make sure it's large enough, as well as not containing bad blocks. How much RAM, and how much SWAP do you have allocated?

Haze
04-19-2002, 09:07 PM
Thats one thing I keep meaning to get to. There is a 1 GB of ram and 1 GB of swap. I keep meaning to boost that up to 2 GB. The ram was upgraded, and I just haven't gotten around to uping the swap yet.. maybe thats the problem. hmm.

billyjoe
04-19-2002, 09:16 PM
The one thing that makes me suspect the swap isn't large enough is this line from your error log.

<1>Unable to handle kernel paging request at virtual address fffffffc

Thats right at the end of the swap memory range.

palmtree
04-19-2002, 09:32 PM
Originally posted by Haze
They checked the cpu and ram and say its fine. I had to open another ticket to get them to look further.. hopefully they will :/

Does anyone have any suggestions, if by chance they say its not hardware related?

How did they check the cpu and ram? Did they put it in another machine or ?

laterz,
palmtree

Haze
04-19-2002, 09:45 PM
They just did some software based tests I think. I have asked that they move it to a new machine and see how it goes.. Hopefully they will.

palmtree
04-20-2002, 12:40 AM
I wouldn't really trust diags from software.. I've just seen to many hardware problems where diags said everything was fine and they weren't..
Hopefully when they move it to a new machine it'll shed some light on your problem..

Good luck,
palmtree

Haze
04-20-2002, 09:24 PM
Well, I finally got a responce, and its not the responce I was after. What do you guys think of this?

your issues are caused primarily by this:

top: user - nobody


PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
3769 nobody 9 0 8688 8688 7408 S 5.9 0.8 0:00 httpd
3781 nobody 11 0 8604 8584 7408 S 2.3 0.8 0:00 httpd
3764 nobody 10 0 10240 9M 7456 S 1.9 1.0 0:00 httpd
3572 nobody 9 0 8704 8704 7408 S 1.3 0.8 0:00 httpd
3637 nobody 9 0 8924 8924 7460 S 1.3 0.8 0:00 httpd
3561 nobody 9 0 9060 9060 7468 S 1.1 0.8 0:00 httpd
3710 nobody 11 0 8328 8328 7384 S 1.1 0.8 0:00 httpd
3634 nobody 9 0 9812 9812 7452 S 0.7 0.9 0:01 httpd
3580 nobody 9 0 8684 8684 7400 S 0.5 0.8 0:00 httpd
3618 nobody 9 0 10416 10M 7456 S 0.3 1.0 0:00 httpd
3712 nobody 9 0 8460 8460 7372 S 0.3 0.8 0:00 httpd
3718 nobody 9 0 8860 8860 7440 S 0.3 0.8 0:00 httpd
3563 nobody 9 0 8968 8968 7416 S 0.1 0.8 0:00 httpd
3615 nobody 9 0 8636 8616 7412 S 0.1 0.8 0:00 httpd
1674 nobody 9 0 2416 2408 2068 S 0.0 0.2 0:00 proftpd
27387 nobody 9 0 2032 2032 1344 S 0.0 0.2 0:00 entropychat
27392 nobody 9 0 620 620 384 S 0.0 0.0 0:00 melange
32550 nobody 9 0 10972 10M 5860 S 0.0 1.0 0:01 httpd
3559 nobody 9 0 0 0 0 Z 0.0 0.0 0:00 httpd
<defunct>
3560 nobody 9 0 8988 8988 7476 S 0.0 0.8 0:01 httpd
3562 nobody 9 0 7356 7356 7096 S 0.0 0.7 0:00 httpd
3564 nobody 9 0 8352 8352 7404 S 0.0 0.8 0:00 httpd
3565 nobody 9 0 10524 10M 7508 S 0.0 1.0 0:01 httpd
3566 nobody 9 0 8844 8844 7420 S 0.0 0.8 0:00 httpd
3567 nobody 9 0 10476 10M 7492 S 0.0 1.0 0:01 httpd
3575 nobody 9 0 7356 7356 7096 S 0.0 0.7 0:00 httpd
3577 nobody 9 0 9116 9116 7464 S 0.0 0.9 0:00 httpd
3578 nobody 9 0 10320 10M 7420 S 0.0 1.0 0:01 httpd
3579 nobody 9 0 8300 8300 7396 S 0.0 0.8 0:00 httpd
3585 nobody 9 0 8772 8772 7416 S 0.0 0.8 0:00 httpd
3613 nobody 9 0 7512 7512 7176 S 0.0 0.7 0:00 httpd
3622 nobody 9 0 8352 8352 7388 S 0.0 0.8 0:00 httpd
3627 nobody 9 0 8640 8640 7408 S 0.0 0.8 0:00 httpd
3628 nobody 9 0 8864 8864 7444 S 0.0 0.8 0:00 httpd
3636 nobody 9 0 8488 8488 7384 S 0.0 0.8 0:00 httpd
3640 nobody 9 0 10436 10M 7456 S 0.0 1.0 0:00 httpd
3644 nobody 9 0 8296 8296 7380 S 0.0 0.8 0:00 httpd
3645 nobody 9 0 8672 8672 7444 S 0.0 0.8 0:00 httpd
3661 nobody 9 0 8392 8392 7392 S 0.0 0.8 0:00 httpd
3663 nobody 9 0 7304 7304 7124 S 0.0 0.7 0:00 httpd
3664 nobody 9 0 7356 7356 7096 S 0.0 0.7 0:00 httpd
3675 nobody 9 0 8612 8592 7404 S 0.0 0.8 0:00 httpd
3711 nobody 9 0 8064 8064 7288 S 0.0 0.7 0:00 httpd
3715 nobody 9 0 8448 8448 7380 S 0.0 0.8 0:00 httpd
3716 nobody 9 0 7356 7356 7096 S 0.0 0.7 0:00 httpd
3717 nobody 9 0 7524 7524 7104 S 0.0 0.7 0:00 httpd
3768 nobody 9 0 7356 7356 7084 S 0.0 0.7 0:00 httpd
3770 nobody 9 0 7296 7296 7116 S 0.0 0.7 0:00 httpd
3771 nobody 9 0 7352 7352 6972 S 0.0 0.7 0:00 httpd
3772 nobody 9 0 7296 7296 7116 S 0.0 0.7 0:00 httpd
3773 nobody 9 0 7296 7296 7116 S 0.0 0.7 0:00 httpd
3776 nobody 9 0 8292 8292 7392 S 0.0 0.8 0:00 httpd
3777 nobody 9 0 8236 8236 7364 S 0.0 0.8 0:00 httpd

your httpd group has been changed to nobody rather than apache, of course
this is bonking all sorts of dependencies.

It is also wholly unsupported, even if there is a hardware problem we
cannot asses it until the system is restored to an original state. To
request a restore please submit a seperate ticket. Until a restore is done
all of your future requests of this nature will be addressed as
unsupported as well and closed without resolution or investigation.

bacid
04-20-2002, 10:15 PM
roflmao..

im no hardware genius.. but i dont think changing apache to run as 'nobody' is causing your problems :)

Haze
04-20-2002, 10:20 PM
How do I get through to these people? Its like talking to a wall!

palmtree
04-21-2002, 01:10 AM
Changing permissions is page faulting the server??
Um, don't think so! Ask them for a real answer.. :confused:

Laterz,
palmtree

kunal
04-21-2002, 02:55 AM
hey.. this might sound lame, but it works for me, ALWAYS!

Just cut all power to the system.. and leave it closed for a couple of hours or so... turn it back on, boot into single, reboot, all should be fine...

Haze
04-21-2002, 06:31 AM
I don't think the 200 sites on the server would appreciate that much :P

kunal
04-21-2002, 06:40 AM
Originally posted by Haze
I don't think the 200 sites on the server would appreciate that much :P


Ooooooooooops.... I thought it was a local machine :o

ToastyX
04-21-2002, 12:09 PM
Originally posted by Haze
How do I get through to these people? Its like talking to a wall!

They obviously don't have a clue what they're talking about, so I'd advise you to look for another provider that actually knows what they're doing and are willing to listen. You seem to have plenty of memory, so the problem looks like either something keeps going bonkers and eating up all of the memory causing the system to grind to a halt, or something's wrong with the memory, processor, or motherboard.

erapid
04-21-2002, 12:24 PM
Hi,

http://www.waider.ie/hacks/diary/2002/february.html

search string [] error_code [kernel] 0x38

may be ... may be, couse you wrote about 1-2 day period. To long to heat, but ...

I'm sure it's not memory or cpu problem. Ram? It's happen more continiously and often. Theoretically, it may be if your chip has error in some address and time by time your soft is trying to get this.

However, you need to think about "what happens periodically on my server", look to another log files.

Regards