Results 1 to 23 of 23
  1. #1
    Join Date
    Oct 2004
    Posts
    305

    LSI MegaRAID and RAID-10 problem

    One of my colo boxes is having a severe I/O latency problem (high iowait on write operations). The specs are Tyan Transport GX28, Dual Opterons 246, 4 GB RAM, 4 x 73 GB 10k rpm Seagate SCSI HDD in RAID-10 with LSI MegaRAID controller, and running CentOS 4.0 + CPanel + 2.6.13.2 kernel. We tried to run full fsck and it ran fine with no errors. Even tried to run it on the stock 2.6 kernel from CentOS but iowait problem still persists.

    The LSI MegaRAID Manager (MEGAMGR) output "other errors" on all drives. Does anyone know what these errors mean and how to fix it?

    The DC is now planning to swap out the RAID controller with a new one. Is it safe to swap out the RAID controller just like that? Will it not corrupt the data on the drives?

    I'm so frustrated with this issue and would really appreciate any input on this. Thank you!

  2. #2
    1. What firmware are you running
    2. We have the same problems with PERC4/DC (same as LSI 320X)
    3. Solution was to flash back to previous f/w

    Also check termination + drive cage, 'other errors' probably means disconnects etc. Also you must investigate firmware revision on drives. If you are running Cheetah 10K.6 drives and have firmware 0006 you will have major problems with this, you must upgrade firmware to 0007, which will likely lead to full data loss on drives so have backups ready. I personally learned this the hard way on my desktop machine with ST3146807LC drives (10K.6)

    I would be happy to call you and give you a hand with this, I remember pulling my hair out 3 months ago with same problem on Dell 2850.
    EuroVPS VPS Hosting - Virtual Private Servers | Web Hosting | Dedicated Servers
    Providing Reliable Plesk and cPanel Servers since 2004, now offering low priced Xen & VMware VPS in Amsterdam
    UK +44.203.355.6681 / Amsterdam +31.208.202.120

  3. #3
    Join Date
    Oct 2004
    Posts
    305
    Thanks for the reply!
    How do I check the firmware? From MegaMGR, it shows Seagate ST373207LC, Revision: D701. Is this firmware 0006 or 0007?

  4. #4
    Join Date
    Aug 2004
    Location
    Zurich, Switzerland
    Posts
    774
    Right, conflicts between certain hard drive firmwares and SCSI BIOS versions come up now and then, if these aren't all new probs the info is posted on the manufacturers' websites in most cases, along with a way to fix the prob.

    As for the slow write performance, you might want to check the drive settings, sometimes it's something as simple as a writeback cache switched off.

  5. #5
    Join Date
    Oct 2004
    Posts
    305
    Thanks RambOrc, I think I will try to turn off the writeback cache.

    I couldn't find any info on LSI and Seagate sites about the problem I'm having. That's why I posted this thread on WHT hoping to get some inputs.

  6. #6
    NO!!! you read RambOrc's message wrong! He is saying NOT to turn writeback cache off on the drives

    On the drives write-back should always be set ON, but that is not your problem here. Also the drives you're using are Cheetah 10K.7 drives and are not affected by the firmware problem, so don't worry about that.

    Is this a production server?
    EuroVPS VPS Hosting - Virtual Private Servers | Web Hosting | Dedicated Servers
    Providing Reliable Plesk and cPanel Servers since 2004, now offering low priced Xen & VMware VPS in Amsterdam
    UK +44.203.355.6681 / Amsterdam +31.208.202.120

  7. #7
    Join Date
    Oct 2004
    Posts
    305
    Ah okay I won't turn it off then. I'm just confused here

    Yes, this is a production server, been online for about 3 months. At first, I didn't realize about the problem but as we put more customers on the server, we notice cpbackup runs very slow at night (causing high iowait) and when we tested with bonnie to confirm it. Server load jumped to over 100 and crashed when running bonnie.

    Do you have any idea what's the problem? Do you think it's caused by that "other errors" output from the LSI MegaRAID?

  8. #8
    Join Date
    Aug 2004
    Location
    Zurich, Switzerland
    Posts
    774
    Sounds like a software and not a hardware problem. BTW what I meant is that maybe writeback cache is turned off by default on your drives and nobody noticed it yet or similar (such things happen often).

  9. #9
    Join Date
    Oct 2004
    Posts
    305
    Writeback cache is indeed enabled for all drives, according to what I see in MegaMGR.

  10. #10
    Join Date
    May 2004
    Location
    Atlanta, GA
    Posts
    3,872
    Quote Originally Posted by RambOrc
    ......BTW what I meant is that maybe writeback cache is turned off by default on your drives and nobody noticed it yet or similar (such things happen often).
    there is a reason why write cache on some RAID card is disable by default. without BBU (battery backup unit), "write cache" is a dangerous thing to be enabled. in the event of power failure, you can get data corruption because the bytes buffered in cache, not yet written to array, will be permenently lost.
    C.W. LEE, Apaq Digital Systems
    http://www.apaqdigital.com
    sales@apaqdigital.com

  11. #11
    Join Date
    May 2004
    Location
    Atlanta, GA
    Posts
    3,872
    blueface,

    1. try adding kernel parameter: "acpi=off" to /boot/grub/grub.conf file
    2. sometime you need to go back to older firmware on RAID card to rid of this sorta issues. it happened sometimes to adaptec RAID cards we've dealt with, and re-flashed RAID firmware with older version resolved the issue.
    C.W. LEE, Apaq Digital Systems
    http://www.apaqdigital.com
    sales@apaqdigital.com

  12. #12
    Join Date
    Oct 2004
    Posts
    305
    Thank you CW!

    1. what's that "acpi=off" option for?

    2. how do I go back to older firmware on the RAID card? is it safe and won't corrupt the data on the drives?

    Do you think replacing the LSI MegaRAID card with a new one will fix the issue?

  13. #13
    Join Date
    May 2004
    Location
    Atlanta, GA
    Posts
    3,872
    Quote Originally Posted by blueface
    Thank you CW!

    1. what's that "acpi=off" option for?

    2. how do I go back to older firmware on the RAID card? is it safe and won't corrupt the data on the drives?

    Do you think replacing the LSI MegaRAID card with a new one will fix the issue?
    ACPI (Advanced Configuration & Power Interface) is basically a BIOS features designed for "wintel" (window OS on Intel) based platform. running linux/BSD on AMD platform with APCI enabled can sometimes cause strange, unexplainable behaviours. we did have lots of experiences that it does "wonder" in curing issues in linux just by turn off ACPI. it's worth a try.

    glad you ask about flashing firmware! yes, sometime array configuration can be destroyed by flashing older firmware (or newer F/W for that matter). so, backing up data on array is a must b4 you change F/W.

    we did't use many LSI raid cards in order to tell u one way or the other. speaking from the experience of dealing w/Adaptec who often realeasse new driver for new F/W. so if your RAID is using the latest F/W, but the linux kernel driver is the 'old' one for the old F/W, then you are in trouble! that's why it worth a try to go back old F/W which should be able to be downloaded from LSI site. however, if you have a Dell/HP/IBM/SUN, then F/W files for the OEM LSI RAID must be obtained from those makes.
    C.W. LEE, Apaq Digital Systems
    http://www.apaqdigital.com
    sales@apaqdigital.com

  14. #14
    Join Date
    May 2004
    Location
    Atlanta, GA
    Posts
    3,872
    also, GX28 w/4x SCSI HDDS (B2881G28U4H) comes with Adaptec AIC-7902 on the S2881 board. how did you end up using LSI card? supposedly, only Adaptec 2010S (0-channel RAID card) can work with the embedded AIC-7902.....did you disable the on-board adaptec and use a standalone LSI 320-1 card?
    C.W. LEE, Apaq Digital Systems
    http://www.apaqdigital.com
    sales@apaqdigital.com

  15. #15
    Join Date
    Oct 2004
    Posts
    305
    Quote Originally Posted by cwl@apaqdigital
    also, GX28 w/4x SCSI HDDS (B2881G28U4H) comes with Adaptec AIC-7902 on the S2881 board. how did you end up using LSI card? supposedly, only Adaptec 2010S (0-channel RAID card) can work with the embedded AIC-7902.....did you disable the on-board adaptec and use a standalone LSI 320-1 card?
    That's something that I need to ask the DC about.

    I believe they're using LSI MegaRAID ZCR card like this one. It it okay?
    http://lsilogic.com/products/megarai...id_320_0x.html

    Provided that it's not compatible with the onboard Adaptec AIC-7902, can I replace it with LSI MegaRAID 320-1 card safely (and disable the on-board Adaptec) without destroying the array?

    Thanks again for all the inputs guys! It's much appreciated.

  16. #16
    Join Date
    May 2004
    Location
    Atlanta, GA
    Posts
    3,872
    Quote Originally Posted by blueface
    That's something that I need to ask the DC about.

    I believe they're using LSI MegaRAID ZCR card like this one. It it okay?
    http://lsilogic.com/products/megarai...id_320_0x.html

    Provided that it's not compatible with the onboard Adaptec AIC-7902, can I replace it with LSI MegaRAID 320-1 card safely (and disable the on-board Adaptec) without destroying the array?

    Thanks again for all the inputs guys! It's much appreciated.
    LSI u320 0-channel on Adaptec 7902? they are not supposedly designed to work together at all. if that was the case in the server, that's certainly could be where your problem is!

    get to the bottom of this, and make sure that DC uses (1) either Adaptec 2010 (PCI-X)/2005 (sodimm type) ZCR, which is designed to work with embedded AIC-7902 (2) or disable on-board AIC-7902, and use LSI u320-1 (single channel RAID).

    swap LSI 320-0 with adaptec 2010/2005 will certainly erase existing array setup!

    if the F/W version is the same between u320-0 and u320-1, the RAID configuration should remain safe after swap, but you just never know! do your array backup first before doing anything!
    C.W. LEE, Apaq Digital Systems
    http://www.apaqdigital.com
    sales@apaqdigital.com

  17. #17
    Join Date
    Aug 2004
    Location
    Zurich, Switzerland
    Posts
    774
    Adaptec officialy states that all recent, current and future RAID adapters are 100% compatible, meaning if you have a RAID array and change to a different controller, your data is guaranteed to be fine. Firmware version shouldn't matter. I guess the same applies to LSI controllers, but I can't say for sure as I haven't spoken to a representative of them and haven't used LSI-based controllers for about two years by now.

  18. #18
    Join Date
    Feb 2002
    Location
    New York, NY
    Posts
    4,618
    Quote Originally Posted by RambOrc
    Adaptec officialy states that all recent, current and future RAID adapters are 100% compatible, meaning if you have a RAID array and change to a different controller, your data is guaranteed to be fine. Firmware version shouldn't matter. I guess the same applies to LSI controllers, but I can't say for sure as I haven't spoken to a representative of them and haven't used LSI-based controllers for about two years by now.
    I believe the issue here is related to switching from Adaptec to LSI or vice versa. Adaptec to Adaptec will probably be ok, or LSI to LSI, but switching between the two will most likely require a new array configuration.
    Scott Burns, President
    BQ Internet Corporation
    Remote Rsync and FTP backup solutions
    *** http://www.bqbackup.com/ ***

  19. #19
    Join Date
    May 2004
    Location
    Atlanta, GA
    Posts
    3,872
    Quote Originally Posted by RambOrc
    Adaptec officialy states that all recent, current and future RAID adapters are 100% compatible, meaning if you have a RAID array and change to a different controller, your data is guaranteed to be fine. Firmware version shouldn't matter.......
    well, firmware matters a lot! at least for the Adaptec SATA RAID cards (2x10SA), with which we dealt in large quantities, doing F/W updating, regardless from older to newer or newer to older, usually destroys existing array.

    RAID driver version usually must match F/W version. the issue for linux platform is that the kernel driver is often "older" than F/W version on new RAID cards, and RAID card makers usually are not very 'diligent' in providing updated linux driver for updated F/W, and non-kernel linux drivers are usually not that easy to install like windows.

    3ware 9500/9550 series is ingenious in that regards! when you load newer driver for the 1st time, it will automatically update RAID card with the new matching firmware at the same time.
    C.W. LEE, Apaq Digital Systems
    http://www.apaqdigital.com
    sales@apaqdigital.com

  20. #20
    Join Date
    Aug 2004
    Location
    Zurich, Switzerland
    Posts
    774
    I guess I need to specify my statement. Once again, I was talking about Adaptec's SCSI, SAS and FC controllers, where this statement holds. I have no official statement on the SATA controllers.

  21. #21
    Join Date
    May 2004
    Location
    Atlanta, GA
    Posts
    3,872
    perhaps true for the new generation of SAS/FC RAID card.

    Adaptec 2120s/2200s u320 RAID (the same Intel 8030x RAID processor as 2x10SA SATA RAID card) fairs no better. going from f/w v7244 to v7349/v8205 (or reversed) will get the existing array disapeared. the driver for 7244 won't work on f/w 7349/8205 either.
    C.W. LEE, Apaq Digital Systems
    http://www.apaqdigital.com
    sales@apaqdigital.com

  22. #22
    Join Date
    Oct 2004
    Posts
    305
    I got the info from the DC that the server's using LSI U320-1, not the ZCR version. They also said even if the 7902 on-board was on, it still would not cause any problems.

    They'll try to replace the RAID card this week.

  23. #23
    Join Date
    May 2004
    Location
    Atlanta, GA
    Posts
    3,872
    Quote Originally Posted by blueface
    I got the info from the DC that the server's using LSI U320-1, not the ZCR version. They also said even if the 7902 on-board was on, it still would not cause any problems.

    They'll try to replace the RAID card this week.
    make sure the new u320-1 replacement card has the same f/w version as the old card. otherwise, you are in real danger losing array! again, doing backup now can never hurt.

    it's true that u320-1 can co-exist with on-board aic-7902, but why do you want to do that since 7902 is not in use?
    C.W. LEE, Apaq Digital Systems
    http://www.apaqdigital.com
    sales@apaqdigital.com

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •