Results 1 to 16 of 16
Thread: Server keeps going down/crashing
-
11-05-2006, 09:03 AM #1Newbie
- Join Date
- Dec 2005
- Posts
- 26
Server keeps going down/crashing
I have a server with SoftLayer. Every couple days, at different times, the server will go down. I'll be unable to access DA, SSH or login via IMPI text console. The server logs don't show anything unusual and we have ruled out the possibility of hardware faults.
I'm 99% sure the server hasn't actually "crashed" - since I can sometimes get the SSH login to come up, but then the connection will close, and it's responding to ping. SL can't find any probs in the logs and they are also unable to login (when the problem is occuring).
I have been told to monitor the server and report anything strange. Obviously, this isn't possible 24/7 so I'm looking into my options. Is there any software which will run every minute or couple minutes and dump everything running on the server at that time, so the next time it does crash, we could reboot and look at what happened just before?
Appreciate all suggestions.
Scott
-
11-05-2006, 11:25 AM #2Aspiring Evangelist
- Join Date
- Jan 2004
- Location
- York, UK
- Posts
- 371
Do you have anything running that automatically manages firewall rulesets? Like lfd which I use to ban IPs from which brute-force password guessing attempts occur, or dfd wehich does a similar job.
If so try turning that off (don't tuen the firewall off completely, of course, just the tools that play with it without your intervention) for a short while and see if that helps. It could be that such a tool is miconfigured or has a bug and is mucking up the firewall rules completely when the bug it triggered.
-
11-05-2006, 11:27 AM #3Newbie
- Join Date
- Dec 2005
- Posts
- 26
I don't think it's the firewall. We have APF+BFD, we can ping it but can't login or access any services (da, ftp, httpd). The datacenter can't login via console either (times out or gives a blank screen). But thanks for your suggestion.
-
11-05-2006, 04:36 PM #4roflcopter
- Join Date
- Feb 2004
- Location
- here and there
- Posts
- 767
Sounds like it's crashing to me - something is causing a lock, but the stack is still able to reply to ICMP ping packets. I've seen this on some machines before.
What OS? Have you compiled your own kernel? Any weird software you're running?Dedicated Servers, Virtual Machines, Colocation, BGP & IPs
objx.net - AS33333 - Salt Lake, Utah
awknet.com - AS17048 - Los Angeles, California
-
11-05-2006, 04:41 PM #5Newbie
- Join Date
- Dec 2005
- Posts
- 26
It's running RHEL 4 (64-bit), 2.6.18 kernel w/ grsecurity, but it was running fine for 20+ days, then started crashing every day (or every other day). Nothing unusual, Apache 2.2, PHP 5.1.6, MySQL 5, DA, ... etc.
I'll try an older grsec kernel for a few days though (or maybe compile a new one) since the 2.6.18 patch was from ~spender so may not be stable/tested much.
Another note, I tried running a PHP script from the command line and it's been seg faulting half way through.
-
11-05-2006, 04:57 PM #6WHT Addict
- Join Date
- Nov 2005
- Location
- Great Falls, VA
- Posts
- 160
Same thing happened to mine starting about a week ago. Nothing very unusual showed up in logs, etc. After crashing about twice a day for 4 days and ruling out every other possibility, we did a chassis swap and it has been perfectly fine since then.
-
11-05-2006, 05:00 PM #7Newbie
- Join Date
- Dec 2005
- Posts
- 26
besposito, thanks. I had the ram changed but that didn't make a difference. I'll try the latest kernel (2.6.18.2 w/ grsec, hopefully 2.6.18.1 patch will work) and failing that I'll ask for the chassis to be swapped.
-
11-05-2006, 05:05 PM #8roflcopter
- Join Date
- Feb 2004
- Location
- here and there
- Posts
- 767
I'd go directly to the chassis swap. If it ran fine for 20+ days without a hitch it's likely power related, or you've got a short somehow...
Dedicated Servers, Virtual Machines, Colocation, BGP & IPs
objx.net - AS33333 - Salt Lake, Utah
awknet.com - AS17048 - Los Angeles, California
-
11-05-2006, 05:11 PM #9WHT Addict
- Join Date
- Nov 2005
- Location
- Great Falls, VA
- Posts
- 160
Originally Posted by bloghost█ JetNet, LLC
█ Fast, Reliable, Affordable
█ Shared • Reseller • Dedicated
█ http://www.jetnethost.com
-
11-05-2006, 06:00 PM #10Newbie
- Join Date
- Dec 2005
- Posts
- 26
I'm on 2.6.18.2 now. I'll hold out and keep my phone close so if it goes down, I can have them swap the chassis completely. Thanks!
-
11-06-2006, 01:51 PM #11Newbie
- Join Date
- Dec 2005
- Posts
- 26
Server went down again a couple hours ago. SoftLayer have scheduled a full chassis swap for later today.
-
11-06-2006, 04:41 PM #12WHT Addict
- Join Date
- Nov 2005
- Location
- Great Falls, VA
- Posts
- 160
Good, that should correct the issue
█ JetNet, LLC
█ Fast, Reliable, Affordable
█ Shared • Reseller • Dedicated
█ http://www.jetnethost.com
-
11-07-2006, 10:30 PM #13WHT Addict
- Join Date
- Jun 2004
- Posts
- 173
Let us know how it goes
BTW, regarding the monitoring software: you should have at least sar installed. It's an absolute minimum There are also tools for on/off- site monitoring: monit, nagios, OpenNMS.
If you only have 1 server to monitor, check monit first. I do suggest you start using monitoring software even if this issue (hopefully) is resolved by the full chassis swap. Good luck.
-
11-08-2006, 09:53 AM #14Retired Moderator
- Join Date
- Nov 2002
- Location
- WebHostingTalk
- Posts
- 8,901
* Moved to Technical and Security Issues...
SiriusI support the Human Rights Campaign!
Moving to the Tampa, Florida area? Check out life in the suburbs in Trinity, Florida.
-
11-08-2006, 10:31 AM #15Newbie
- Join Date
- Dec 2005
- Posts
- 26
Originally Posted by Webcart
02:30:29 up 1 day, 13:34, 1 user, load average: 0.27, 0.20, 0.21
-
11-12-2006, 11:19 AM #16Newbie
- Join Date
- Dec 2005
- Posts
- 26
It crashed again this morning unfortunately.