Results 1 to 23 of 23
Thread: LSI MegaRAID and RAID-10 problem
-
10-27-2005, 01:44 PM #1Web Hosting Guru
- Join Date
- Oct 2004
- Posts
- 305
LSI MegaRAID and RAID-10 problem
One of my colo boxes is having a severe I/O latency problem (high iowait on write operations). The specs are Tyan Transport GX28, Dual Opterons 246, 4 GB RAM, 4 x 73 GB 10k rpm Seagate SCSI HDD in RAID-10 with LSI MegaRAID controller, and running CentOS 4.0 + CPanel + 2.6.13.2 kernel. We tried to run full fsck and it ran fine with no errors. Even tried to run it on the stock 2.6 kernel from CentOS but iowait problem still persists.
The LSI MegaRAID Manager (MEGAMGR) output "other errors" on all drives. Does anyone know what these errors mean and how to fix it?
The DC is now planning to swap out the RAID controller with a new one. Is it safe to swap out the RAID controller just like that? Will it not corrupt the data on the drives?
I'm so frustrated with this issue and would really appreciate any input on this. Thank you!
-
10-27-2005, 02:17 PM #2Web Hosting Master
- Join Date
- Jun 2004
- Posts
- 972
1. What firmware are you running
2. We have the same problems with PERC4/DC (same as LSI 320X)
3. Solution was to flash back to previous f/w
Also check termination + drive cage, 'other errors' probably means disconnects etc. Also you must investigate firmware revision on drives. If you are running Cheetah 10K.6 drives and have firmware 0006 you will have major problems with this, you must upgrade firmware to 0007, which will likely lead to full data loss on drives so have backups ready. I personally learned this the hard way on my desktop machine with ST3146807LC drives (10K.6)
I would be happy to call you and give you a hand with this, I remember pulling my hair out 3 months ago with same problem on Dell 2850.EuroVPS VPS Hosting - Virtual Private Servers | Web Hosting | Dedicated Servers
Providing Reliable Plesk and cPanel Servers since 2004, now offering low priced Xen & VMware VPS in Amsterdam
UK +44.203.355.6681 / Amsterdam +31.208.202.120
-
10-27-2005, 02:36 PM #3Web Hosting Guru
- Join Date
- Oct 2004
- Posts
- 305
Thanks for the reply!
How do I check the firmware? From MegaMGR, it shows Seagate ST373207LC, Revision: D701. Is this firmware 0006 or 0007?
-
10-27-2005, 06:05 PM #4Disabled
- Join Date
- Aug 2004
- Location
- Zurich, Switzerland
- Posts
- 774
Right, conflicts between certain hard drive firmwares and SCSI BIOS versions come up now and then, if these aren't all new probs the info is posted on the manufacturers' websites in most cases, along with a way to fix the prob.
As for the slow write performance, you might want to check the drive settings, sometimes it's something as simple as a writeback cache switched off.
-
10-28-2005, 01:49 AM #5Web Hosting Guru
- Join Date
- Oct 2004
- Posts
- 305
Thanks RambOrc, I think I will try to turn off the writeback cache.
I couldn't find any info on LSI and Seagate sites about the problem I'm having. That's why I posted this thread on WHT hoping to get some inputs.
-
10-28-2005, 03:30 AM #6Web Hosting Master
- Join Date
- Jun 2004
- Posts
- 972
NO!!! you read RambOrc's message wrong! He is saying NOT to turn writeback cache off on the drives
On the drives write-back should always be set ON, but that is not your problem here. Also the drives you're using are Cheetah 10K.7 drives and are not affected by the firmware problem, so don't worry about that.
Is this a production server?EuroVPS VPS Hosting - Virtual Private Servers | Web Hosting | Dedicated Servers
Providing Reliable Plesk and cPanel Servers since 2004, now offering low priced Xen & VMware VPS in Amsterdam
UK +44.203.355.6681 / Amsterdam +31.208.202.120
-
10-28-2005, 04:16 AM #7Web Hosting Guru
- Join Date
- Oct 2004
- Posts
- 305
Ah okay I won't turn it off then. I'm just confused here
Yes, this is a production server, been online for about 3 months. At first, I didn't realize about the problem but as we put more customers on the server, we notice cpbackup runs very slow at night (causing high iowait) and when we tested with bonnie to confirm it. Server load jumped to over 100 and crashed when running bonnie.
Do you have any idea what's the problem? Do you think it's caused by that "other errors" output from the LSI MegaRAID?
-
10-28-2005, 07:53 AM #8Disabled
- Join Date
- Aug 2004
- Location
- Zurich, Switzerland
- Posts
- 774
Sounds like a software and not a hardware problem. BTW what I meant is that maybe writeback cache is turned off by default on your drives and nobody noticed it yet or similar (such things happen often).
-
10-28-2005, 11:55 AM #9Web Hosting Guru
- Join Date
- Oct 2004
- Posts
- 305
Writeback cache is indeed enabled for all drives, according to what I see in MegaMGR.
-
10-28-2005, 04:09 PM #10Web Hosting Master
- Join Date
- May 2004
- Location
- Atlanta, GA
- Posts
- 3,872
Originally Posted by RambOrc
-
10-28-2005, 04:17 PM #11Web Hosting Master
- Join Date
- May 2004
- Location
- Atlanta, GA
- Posts
- 3,872
blueface,
1. try adding kernel parameter: "acpi=off" to /boot/grub/grub.conf file
2. sometime you need to go back to older firmware on RAID card to rid of this sorta issues. it happened sometimes to adaptec RAID cards we've dealt with, and re-flashed RAID firmware with older version resolved the issue.
-
10-28-2005, 04:22 PM #12Web Hosting Guru
- Join Date
- Oct 2004
- Posts
- 305
Thank you CW!
1. what's that "acpi=off" option for?
2. how do I go back to older firmware on the RAID card? is it safe and won't corrupt the data on the drives?
Do you think replacing the LSI MegaRAID card with a new one will fix the issue?
-
10-28-2005, 05:26 PM #13Web Hosting Master
- Join Date
- May 2004
- Location
- Atlanta, GA
- Posts
- 3,872
Originally Posted by blueface
glad you ask about flashing firmware! yes, sometime array configuration can be destroyed by flashing older firmware (or newer F/W for that matter). so, backing up data on array is a must b4 you change F/W.
we did't use many LSI raid cards in order to tell u one way or the other. speaking from the experience of dealing w/Adaptec who often realeasse new driver for new F/W. so if your RAID is using the latest F/W, but the linux kernel driver is the 'old' one for the old F/W, then you are in trouble! that's why it worth a try to go back old F/W which should be able to be downloaded from LSI site. however, if you have a Dell/HP/IBM/SUN, then F/W files for the OEM LSI RAID must be obtained from those makes.
-
10-28-2005, 05:41 PM #14Web Hosting Master
- Join Date
- May 2004
- Location
- Atlanta, GA
- Posts
- 3,872
also, GX28 w/4x SCSI HDDS (B2881G28U4H) comes with Adaptec AIC-7902 on the S2881 board. how did you end up using LSI card? supposedly, only Adaptec 2010S (0-channel RAID card) can work with the embedded AIC-7902.....did you disable the on-board adaptec and use a standalone LSI 320-1 card?
-
10-29-2005, 01:08 AM #15Web Hosting Guru
- Join Date
- Oct 2004
- Posts
- 305
Originally Posted by cwl@apaqdigital
I believe they're using LSI MegaRAID ZCR card like this one. It it okay?
http://lsilogic.com/products/megarai...id_320_0x.html
Provided that it's not compatible with the onboard Adaptec AIC-7902, can I replace it with LSI MegaRAID 320-1 card safely (and disable the on-board Adaptec) without destroying the array?
Thanks again for all the inputs guys! It's much appreciated.
-
10-29-2005, 11:34 AM #16Web Hosting Master
- Join Date
- May 2004
- Location
- Atlanta, GA
- Posts
- 3,872
Originally Posted by blueface
get to the bottom of this, and make sure that DC uses (1) either Adaptec 2010 (PCI-X)/2005 (sodimm type) ZCR, which is designed to work with embedded AIC-7902 (2) or disable on-board AIC-7902, and use LSI u320-1 (single channel RAID).
swap LSI 320-0 with adaptec 2010/2005 will certainly erase existing array setup!
if the F/W version is the same between u320-0 and u320-1, the RAID configuration should remain safe after swap, but you just never know! do your array backup first before doing anything!
-
10-29-2005, 12:18 PM #17Disabled
- Join Date
- Aug 2004
- Location
- Zurich, Switzerland
- Posts
- 774
Adaptec officialy states that all recent, current and future RAID adapters are 100% compatible, meaning if you have a RAID array and change to a different controller, your data is guaranteed to be fine. Firmware version shouldn't matter. I guess the same applies to LSI controllers, but I can't say for sure as I haven't spoken to a representative of them and haven't used LSI-based controllers for about two years by now.
-
10-29-2005, 01:47 PM #18Backup Guru
- Join Date
- Feb 2002
- Location
- New York, NY
- Posts
- 4,618
Originally Posted by RambOrcScott Burns, President
BQ Internet Corporation
Remote Rsync and FTP backup solutions
*** http://www.bqbackup.com/ ***
-
10-29-2005, 02:15 PM #19Web Hosting Master
- Join Date
- May 2004
- Location
- Atlanta, GA
- Posts
- 3,872
Originally Posted by RambOrc
RAID driver version usually must match F/W version. the issue for linux platform is that the kernel driver is often "older" than F/W version on new RAID cards, and RAID card makers usually are not very 'diligent' in providing updated linux driver for updated F/W, and non-kernel linux drivers are usually not that easy to install like windows.
3ware 9500/9550 series is ingenious in that regards! when you load newer driver for the 1st time, it will automatically update RAID card with the new matching firmware at the same time.
-
10-29-2005, 02:43 PM #20Disabled
- Join Date
- Aug 2004
- Location
- Zurich, Switzerland
- Posts
- 774
I guess I need to specify my statement. Once again, I was talking about Adaptec's SCSI, SAS and FC controllers, where this statement holds. I have no official statement on the SATA controllers.
-
10-29-2005, 03:34 PM #21Web Hosting Master
- Join Date
- May 2004
- Location
- Atlanta, GA
- Posts
- 3,872
perhaps true for the new generation of SAS/FC RAID card.
Adaptec 2120s/2200s u320 RAID (the same Intel 8030x RAID processor as 2x10SA SATA RAID card) fairs no better. going from f/w v7244 to v7349/v8205 (or reversed) will get the existing array disapeared. the driver for 7244 won't work on f/w 7349/8205 either.
-
10-30-2005, 10:23 AM #22Web Hosting Guru
- Join Date
- Oct 2004
- Posts
- 305
I got the info from the DC that the server's using LSI U320-1, not the ZCR version. They also said even if the 7902 on-board was on, it still would not cause any problems.
They'll try to replace the RAID card this week.
-
10-31-2005, 08:50 AM #23Web Hosting Master
- Join Date
- May 2004
- Location
- Atlanta, GA
- Posts
- 3,872
Originally Posted by blueface
it's true that u320-1 can co-exist with on-board aic-7902, but why do you want to do that since 7902 is not in use?