
|
View Full Version : Raq4 with latest patches hanging
coolioj 02-27-2003, 02:40 PM Hello,
Has anyone run a Raq4, put the latest security patches from Sun, and have it hang intermittently. A reboot has to be performed to clear the hang. We are running 128mb RAM in it, and wondering if that would be an issue.
I checked the free memory when it's running, and it shows that 131mb combined from physical and swap are available.
Thanks,
Coolioj
cgisupp 02-27-2003, 05:41 PM I have 3 RAQ4's and the only one I struck problems with (similar to yours) was a RAQ4r running the kernel 2.2.16C32_III
This would hang regularly. After changing the kernel I never had a problem.
coolioj 02-27-2003, 07:35 PM I'm actually running 2.2.16C33_III. What version did you upgrade to?
How do you upgrade the kernel? That sounds difficult. Sorry, still new to Linux and Cobalts.
Thanks,
Coolioj
cgisupp 02-27-2003, 10:03 PM I haven't had a problem with the 2.2.16C33_III version.
raq4less 02-27-2003, 10:11 PM We haven't had any "Hanging" problems with any of our RAQ servers. We do have 512mb of RAM in every server, so that may be part of it? We issue new servers FULLY patched to the latest official SUN patch, No complaints out of the 30 New servers issued the past couple weeks.
I firmly believe that EVERY Cobalt Raq Server deserves 512mb of RAM. IMHO
ellebi 02-28-2003, 04:59 AM I had the same problem. One of my servers started hanging after the kernel upgrade.
I installed an older versione, 2.2.16C28_III-4, and the problem disapperead.
From what I have undestrood the server was loosing network connectivity. The server is fully patched and as 512 Mb of Ram.
BruceT 02-28-2003, 06:35 AM My "production" RaQ 4r has been running on 2.2.16C33_III since release with no issues. (I did add the RAID rebuild speedup tweak in case the power goes off at the colo, but no other "problems" have appeared).
When you say it "hangs" - what does that mean? All services stop and are unreachable, requiring a hard power off/on? Web only doesn't work? What do the logs show (/var/log/messages, /var/log/dmesg, etc)?
coolioj 03-06-2003, 08:08 PM Originally posted by ellebi
I had the same problem. One of my servers started hanging after the kernel upgrade.
I installed an older versione, 2.2.16C28_III-4, and the problem disapperead.
From what I have undestrood the server was loosing network connectivity. The server is fully patched and as 512 Mb of Ram.
Hello,
Thanks for all of your suggestions. How does one downgrade the kernel?
Coolioj
BruceT 03-06-2003, 08:14 PM You'll have to break out the RPMS from the older kernel PKG using
tar zxvf old-kernel-package.pkg
Go to the RPMS directory and manually install each RPM:
rpm -ivh --force each.kernel.rpm
Then reboot. Use at your own risk, though -- I'm not sure what other dependencies there are on the kernel versions (e.g., something else may be depending on that newer kernel version -- I don't think so, but...)
CCNST 03-06-2003, 08:23 PM Bruce what do you mean with "RAID rebuild speedup tweak"?
BruceT 03-06-2003, 08:42 PM The latest kernel which was released ended up with a "bug" in the RAID rebuild code, where it doesn't give proper priority to the process. RAID rebuilds now take about 2 days to complete (they do complete and everything is fine, it just runs a LONG time!)
A new kernel is in QA at Sun; until it's released, one of the engineers passed me a "hack" to fix the problem:
Shell into the server as root and do
echo 5000 > /proc/sys/dev/md/speed-limit
You may want to add a similar line to the end of /etc/rc.d/rc.local so each time the server reboots this command will be executed (remember to remove it when the new fixed kernel is released and installed).
Side effect is a slight increase in server load while the RAID is rebuilding, but I think it's a small price to pay for completion in a "normal" amount of time (2 hrs for my 2x40GB 4r).
I guess by putting an even bigger number in, it would speed up even more (to a point), and the server load would go up even more...
This info is also in the cobalt-users list archives -- if you're not on the list, I highly recommend it. There is a lot of good info that gets handed out there by a lot of very knowledgeable people. Go to http://list.cobalt.com/mailman/listinfo/cobalt-users to sign up, or to search the archives.
cwatkins 03-16-2003, 07:22 PM Had the same problem, had to downgrade the kernel.
Sun QA problem?
|