Haze
04-18-2002, 06:41 PM
We have a re-occurring issue on one of our servers, where it crashes every 2 or 3 days. Other than that, it runs fine. Would any of you have any idea what the attached error message means?
![]() | View Full Version : Server Crashing... Haze 04-18-2002, 06:41 PM We have a re-occurring issue on one of our servers, where it crashes every 2 or 3 days. Other than that, it runs fine. Would any of you have any idea what the attached error message means? DanielP 04-18-2002, 07:09 PM My best guess would be bad ram, or a bad mboard, but somewhere around in those areas. Haze 04-18-2002, 07:13 PM Thats what I'm thinking :( We have already made one move because of hardware issues. I hope this isn't a continuous thing. The Prohacker 04-18-2002, 07:36 PM Originally posted by Haze Thats what I'm thinking :( We have already made one move because of hardware issues. I hope this isn't a continuous thing. Change the ram and possibly the cpu, if that doesn't help, change the mobo and power supply.... It can really be several issues, but does sound hardware, and could be any of the 4.... You'd have to run certain testing software and benchmarking software to find out exactly where the problem lies... Haze 04-19-2002, 08:52 PM They checked the cpu and ram and say its fine. I had to open another ticket to get them to look further.. hopefully they will :/ Does anyone have any suggestions, if by chance they say its not hardware related? billyjoe 04-19-2002, 08:56 PM One thing I'd look at is the swap partition. Make sure it's large enough, as well as not containing bad blocks. How much RAM, and how much SWAP do you have allocated? Haze 04-19-2002, 09:07 PM Thats one thing I keep meaning to get to. There is a 1 GB of ram and 1 GB of swap. I keep meaning to boost that up to 2 GB. The ram was upgraded, and I just haven't gotten around to uping the swap yet.. maybe thats the problem. hmm. billyjoe 04-19-2002, 09:16 PM The one thing that makes me suspect the swap isn't large enough is this line from your error log. <1>Unable to handle kernel paging request at virtual address fffffffc Thats right at the end of the swap memory range. palmtree 04-19-2002, 09:32 PM Originally posted by Haze They checked the cpu and ram and say its fine. I had to open another ticket to get them to look further.. hopefully they will :/ Does anyone have any suggestions, if by chance they say its not hardware related? How did they check the cpu and ram? Did they put it in another machine or ? laterz, palmtree Haze 04-19-2002, 09:45 PM They just did some software based tests I think. I have asked that they move it to a new machine and see how it goes.. Hopefully they will. palmtree 04-20-2002, 12:40 AM I wouldn't really trust diags from software.. I've just seen to many hardware problems where diags said everything was fine and they weren't.. Hopefully when they move it to a new machine it'll shed some light on your problem.. Good luck, palmtree Haze 04-20-2002, 09:24 PM Well, I finally got a responce, and its not the responce I was after. What do you guys think of this? your issues are caused primarily by this: top: user - nobody PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 3769 nobody 9 0 8688 8688 7408 S 5.9 0.8 0:00 httpd 3781 nobody 11 0 8604 8584 7408 S 2.3 0.8 0:00 httpd 3764 nobody 10 0 10240 9M 7456 S 1.9 1.0 0:00 httpd 3572 nobody 9 0 8704 8704 7408 S 1.3 0.8 0:00 httpd 3637 nobody 9 0 8924 8924 7460 S 1.3 0.8 0:00 httpd 3561 nobody 9 0 9060 9060 7468 S 1.1 0.8 0:00 httpd 3710 nobody 11 0 8328 8328 7384 S 1.1 0.8 0:00 httpd 3634 nobody 9 0 9812 9812 7452 S 0.7 0.9 0:01 httpd 3580 nobody 9 0 8684 8684 7400 S 0.5 0.8 0:00 httpd 3618 nobody 9 0 10416 10M 7456 S 0.3 1.0 0:00 httpd 3712 nobody 9 0 8460 8460 7372 S 0.3 0.8 0:00 httpd 3718 nobody 9 0 8860 8860 7440 S 0.3 0.8 0:00 httpd 3563 nobody 9 0 8968 8968 7416 S 0.1 0.8 0:00 httpd 3615 nobody 9 0 8636 8616 7412 S 0.1 0.8 0:00 httpd 1674 nobody 9 0 2416 2408 2068 S 0.0 0.2 0:00 proftpd 27387 nobody 9 0 2032 2032 1344 S 0.0 0.2 0:00 entropychat 27392 nobody 9 0 620 620 384 S 0.0 0.0 0:00 melange 32550 nobody 9 0 10972 10M 5860 S 0.0 1.0 0:01 httpd 3559 nobody 9 0 0 0 0 Z 0.0 0.0 0:00 httpd <defunct> 3560 nobody 9 0 8988 8988 7476 S 0.0 0.8 0:01 httpd 3562 nobody 9 0 7356 7356 7096 S 0.0 0.7 0:00 httpd 3564 nobody 9 0 8352 8352 7404 S 0.0 0.8 0:00 httpd 3565 nobody 9 0 10524 10M 7508 S 0.0 1.0 0:01 httpd 3566 nobody 9 0 8844 8844 7420 S 0.0 0.8 0:00 httpd 3567 nobody 9 0 10476 10M 7492 S 0.0 1.0 0:01 httpd 3575 nobody 9 0 7356 7356 7096 S 0.0 0.7 0:00 httpd 3577 nobody 9 0 9116 9116 7464 S 0.0 0.9 0:00 httpd 3578 nobody 9 0 10320 10M 7420 S 0.0 1.0 0:01 httpd 3579 nobody 9 0 8300 8300 7396 S 0.0 0.8 0:00 httpd 3585 nobody 9 0 8772 8772 7416 S 0.0 0.8 0:00 httpd 3613 nobody 9 0 7512 7512 7176 S 0.0 0.7 0:00 httpd 3622 nobody 9 0 8352 8352 7388 S 0.0 0.8 0:00 httpd 3627 nobody 9 0 8640 8640 7408 S 0.0 0.8 0:00 httpd 3628 nobody 9 0 8864 8864 7444 S 0.0 0.8 0:00 httpd 3636 nobody 9 0 8488 8488 7384 S 0.0 0.8 0:00 httpd 3640 nobody 9 0 10436 10M 7456 S 0.0 1.0 0:00 httpd 3644 nobody 9 0 8296 8296 7380 S 0.0 0.8 0:00 httpd 3645 nobody 9 0 8672 8672 7444 S 0.0 0.8 0:00 httpd 3661 nobody 9 0 8392 8392 7392 S 0.0 0.8 0:00 httpd 3663 nobody 9 0 7304 7304 7124 S 0.0 0.7 0:00 httpd 3664 nobody 9 0 7356 7356 7096 S 0.0 0.7 0:00 httpd 3675 nobody 9 0 8612 8592 7404 S 0.0 0.8 0:00 httpd 3711 nobody 9 0 8064 8064 7288 S 0.0 0.7 0:00 httpd 3715 nobody 9 0 8448 8448 7380 S 0.0 0.8 0:00 httpd 3716 nobody 9 0 7356 7356 7096 S 0.0 0.7 0:00 httpd 3717 nobody 9 0 7524 7524 7104 S 0.0 0.7 0:00 httpd 3768 nobody 9 0 7356 7356 7084 S 0.0 0.7 0:00 httpd 3770 nobody 9 0 7296 7296 7116 S 0.0 0.7 0:00 httpd 3771 nobody 9 0 7352 7352 6972 S 0.0 0.7 0:00 httpd 3772 nobody 9 0 7296 7296 7116 S 0.0 0.7 0:00 httpd 3773 nobody 9 0 7296 7296 7116 S 0.0 0.7 0:00 httpd 3776 nobody 9 0 8292 8292 7392 S 0.0 0.8 0:00 httpd 3777 nobody 9 0 8236 8236 7364 S 0.0 0.8 0:00 httpd your httpd group has been changed to nobody rather than apache, of course this is bonking all sorts of dependencies. It is also wholly unsupported, even if there is a hardware problem we cannot asses it until the system is restored to an original state. To request a restore please submit a seperate ticket. Until a restore is done all of your future requests of this nature will be addressed as unsupported as well and closed without resolution or investigation. bacid 04-20-2002, 10:15 PM roflmao.. im no hardware genius.. but i dont think changing apache to run as 'nobody' is causing your problems :) Haze 04-20-2002, 10:20 PM How do I get through to these people? Its like talking to a wall! palmtree 04-21-2002, 01:10 AM Changing permissions is page faulting the server?? Um, don't think so! Ask them for a real answer.. :confused: Laterz, palmtree kunal 04-21-2002, 02:55 AM hey.. this might sound lame, but it works for me, ALWAYS! Just cut all power to the system.. and leave it closed for a couple of hours or so... turn it back on, boot into single, reboot, all should be fine... Haze 04-21-2002, 06:31 AM I don't think the 200 sites on the server would appreciate that much :P kunal 04-21-2002, 06:40 AM Originally posted by Haze I don't think the 200 sites on the server would appreciate that much :P Ooooooooooops.... I thought it was a local machine :o ToastyX 04-21-2002, 12:09 PM Originally posted by Haze How do I get through to these people? Its like talking to a wall! They obviously don't have a clue what they're talking about, so I'd advise you to look for another provider that actually knows what they're doing and are willing to listen. You seem to have plenty of memory, so the problem looks like either something keeps going bonkers and eating up all of the memory causing the system to grind to a halt, or something's wrong with the memory, processor, or motherboard. erapid 04-21-2002, 12:24 PM Hi, http://www.waider.ie/hacks/diary/2002/february.html search string [] error_code [kernel] 0x38 may be ... may be, couse you wrote about 1-2 day period. To long to heat, but ... I'm sure it's not memory or cpu problem. Ram? It's happen more continiously and often. Theoretically, it may be if your chip has error in some address and time by time your soft is trying to get this. However, you need to think about "what happens periodically on my server", look to another log files. Regards |