I have a raq4 running a 550os that has been running great for over a year now.
I have recently run into a problem that I can not identify the source of and hope someone can help.
A couple of months ago my main web server (email, ftp and 2nd DNS) has been locking up randomly on me. By locking up I mean that I can not ping, ssh, browse to it or even use the LCD screen. A hard restart using the power button seems to fix everything until the next occurrence, which seems to be random. I have checked all logs in /var/log and nothing weird shows up. I can usually pinpoint the moment it locked by the last log in one of the services being recorded. All temperatures in the server room are perfect. Once I reboot everything checks out ok. I even run swatch and all status is fine.
Start logging the CPU temperature and other fan status and internal temperatures, even if the room temp is OK. You can find this info in /proc/cobalt or somewhere similar. If you poke around in the Active Monitor implementation in Sausalito you'll find them. A wild guess is that your problem is a bad power supply. You should also check the "SMART" status of your drives to see if there have been errors or overtemp conditions. Failing this you have a hardware intermittent, which is the hardest thing in the world to diagnose.
First the Fans.. you want to make sure they are both running and I don't mean just turning... they need to have a little torque to them... if you put your finger in them they should not stop easy. after that make sure the front vents are dust free.
you can check your temp with...
While your in there you will notice the cpu MHz line and it will tell you if you have a 300 or 450mgz processor... (You could have a 500 to and if so that just might be the overheating problem.) with that in mind... we can say there is not a lot of cpu here!
I have seen all sorts of things cause this, including...
mysql databases crashing and using up all the connections, Sendmail/Mailscanner, being hacked, heavy board usage, access report generation, bad cron jobs, virus updates failing (not clamav), backup programs, geese the list goes on of thnings that happen to all servers!
So you might just run top and see what's using all the cpu and or post your ps -axfw so I can see what it's doing.