I'm running a Proliant DL580 in-house, and just today it quit responding to requests SSH, FTP, Apache, Samba, etc.
Now this server has been running great non-stop for months now. It is located behind my router with no traffic being routed to it and is assigned a private IP (192.168.x.x), so no it wasnt rooted, unless if I unknowingly rooted it by working on my machine...
I'm running BlueQuartz (CentOS) on it, and while trying to figure out wtf happened, I couldnt get the machine to boot at all. After playing around and reseating all components, I was able to get it to the kernel screen in which the default (2.6.9-34.0.2smp I believe), it tells me all 4 CPUs cannot be used, and hangs...
After rebooting (and lord this takes awhile), I was able to boot to a prompt with (2.6.9-34smp) with 1 CPU. Now, where would I go to find out what happened?
Yes, I'm a linux "n00b". What kind of logs should I look for?
Did you check out all the cpus on the system phyically ? The last time we had an issue like that it turned out one of the wires inside had gotten stuck on the fan and it overheated. It would boot up on a few times and only for short spans of times.
Good luck with the investigation.
BLUETRIDENT.NET - Reliable Shared, Reseller and Dedicated Hosting Solutions Provider
Managed Hosting with Personal Service
Highspeed Content Servers, Lighttpd, Ruby on Rails, Cluster Servers & Rich Web Application Hosting
I checked /var/log/messages last night after searching, and saw a time when the machines clock was changed to Jan 1, but beyond that nothing that would seem like to me to hang the machine and make it unresponsive.
I'm going to check the others when I get home from work and maybe they'll provide some info...
Yes, the CPUs were firmly seated, and the fans for the CPUs were running fine. When there is a fan failure, it flashes a light on the case, and if any "CPU fans" fail (6 total), the machine will power down after 30 seconds.
I was able to get the machine running fine with the new kernel and still havent found the original problem. Maybe it was just a Gremlin. :?