Results 1 to 17 of 17
  1. #1
    Join Date
    Dec 2006
    Location
    Missoula,MT
    Posts
    46

    RAID failures the most common?

    Hello,

    More of a poll-type question here.

    I've managed hundreds of servers in my days. Most of them had SCSI RAID configurations, mainly RAID 1 or 5. The most common failure I saw, on all of the machines, wasn't bad RAM, disk drive failures, power supplies and the like. I don't have exact figures, but I would estimate something like 75% of the hardware problems I've seen have been related to RAID controllers (card failures).

    First, does anyone else have this experience also, or is this just some kind of strange bit of luck I've had over the years with RAID systems?

    Second, if this is also common among others, how does that affect your server build concepts for HA/Enterprise systems, especially when specing such a solution out of dedicated servers?

    Thanks

  2. #2
    Join Date
    Dec 2004
    Location
    New York, NY
    Posts
    10,710
    Have you been using a specific RAID card primarily in your setups? If so, which?

  3. #3
    Join Date
    Aug 2006
    Posts
    75
    I've seen the opposite:

    1.) Drives
    2.) PSU's
    3.) Memory
    4.) Board Level (Mobo, proc, etc.)
    5.) Raid Cards

    That isn't to say when I think of the most painful failures I've worked with that RAID problems aren't #1. That bad card makes for 1000x the pain of a dead drive.
    Caro.Net: Support is everything
    Offering High Quality Dell Dedicated Servers since 1995

  4. #4
    Join Date
    Dec 2006
    Location
    Missoula,MT
    Posts
    46

    Hrmm, not really any specific configuration

    Quote Originally Posted by layer0
    Have you been using a specific RAID card primarily in your setups? If so, which?
    I'm taking more of a general approach and saying, in general, when dealing with RAID systems, the controllers have almost always been the piece to go first.

    Some of them had SEAGATE drives with Adaptec controllers, have used some others too but I'm not sure there was a significant majority manufacturer among them all. The controllers always worked when drive failures happened. The problem was always that when the card failed so did everything else.

    I'm not talking about hundreds of failures here, probably something like 3 or 4 catastophic failures in the last 7 or 8 years on RAID controllers while I've had only 1 drive flake out and probably 1 or 2 power supplies go bad, out of a total number of approx. 200 or so machines for roughly 10 different companies. I never saw any combination of hardware failure on the same machine, it was always isolated to one specific failure.

  5. #5
    Join Date
    Jan 2006
    Location
    San Diego
    Posts
    1,103
    I have seen a few caused by the ribbon cable, some new SATA II cables are failing too.

  6. #6
    Join Date
    Sep 2000
    Location
    Alberta, Canada
    Posts
    3,146
    Just went throught this situation with one of our DataCenters and a Server using RAID1. After new hard drive was installed it would not rebuild. Even went so far as trying 5 - 6 different hard drives and 2 - 3 different RAID cards. Nothing worked.

    I can only presume that in this case, it was the Motherboard that was the problem. I never would have thought of the cable causing a problem?

    Finally had to start from scratch with just a Primary and Backup hard drive to get the Server back online. Currently having a new Server built, again with RAID1, and everything will be pre-tested before transferring Clients over.

    Anyone know if Seagate is a bad hard drive choice for RAID?
    PotentProducts.com - for all your Hosting needs
    Helping people Host, Create and Maintain their Web Site
    ServerAdmin Services also available

  7. #7
    Join Date
    Oct 2003
    Location
    Chicago, IL
    Posts
    657
    In response to the OP, I've seen the opposite. RAID card failures for me have extremely rare when compared against hard disk failures.
    Zac Cogswell / CEI
    Formerly known as WiredTree Zac

  8. #8
    Join Date
    Feb 2002
    Posts
    848
    The only thing I noticed is that using both 3ware and adaptec raid cards, you'll tend to go through more drives as the raid cards detect more subtle timing errors and drop drives long before one would normally notice any error in a stand-alone situation.

  9. #9
    Join Date
    Mar 2004
    Location
    Singapore
    Posts
    6,990
    For me, I feel Raid card problems are very rare, I normally stick to 3ware cards. Most problematic are drives followed by PSU.

  10. #10
    Join Date
    Feb 2005
    Location
    Montreal, PQ
    Posts
    355
    After using both software and hardware raid (mainly software) over the years, I've never had the raid chipset/card fail. Though I've had at least 6-8 drives die. Most of them were Maxtor/WD. At the moment all my servers are using Seagate drives, and not one of them has died since. But I did have 4 Seagate drives dead on arrival (USPS is crap!!!).
    Servers proudly hosted at... WebNX (10) - Netelligent (6)
    Also enjoying services from DreamHost, Hyperspin
    (#) indicates number of servers - 16 total

  11. #11
    Join Date
    Apr 2005
    Location
    San Francisco, CA
    Posts
    1,031
    Most chaos I had is RAID 5 (on DELL PowerEdge) with 4 Seagate Cheetah's - 2 HDD's got bad - 1 died and another is full of bad blocks - whole system get mess and I have to stop using RAID 5 setups after that because of the way it's using storage (If one drive is fail - whole arraw is useless).

    RAID 1 and RAID 10 setups doing just fine so far.

  12. #12
    Join Date
    Nov 2006
    Location
    Amsterdam, NL
    Posts
    250
    From experience, hard drives seem to fail more than anything else. Raid 10 saves the day though.
    ••••• John Strong - SolidHost COO •••••
    »» http://www.SolidHost.com ««
    SolidHost offers fully managed Linux and Windows VPS's in the Netherlands , with Plesk, cPanel, Directadmin and Helm.

  13. #13
    Join Date
    Jan 2003
    Location
    Chicago, IL
    Posts
    6,957
    We have dozens of systems with RAID cards and I can't think of a single RAID card failure, we use Adaptec and 3ware cards. We had one that arrived bad, but as it was bad from the get-go it didn't cause any real issues, etc.

    You're saying that over the past 7-8 years, with 400+ drives, you've only had one drive failure?? Wow.. That is likely what is pulling it out of perspective, we generally see about a 1% drive failure on an annual basis, using mostly Seagate drives. Western Digital drives were closer to 2%, and SCSI drives were about 2.5-3% (Yes, we've had worse failure rates with SCSI drives than with SATA drives)

    We might be pickier than some about calling something a drive "failure" though...
    Karl Zimmerman - Founder & CEO of Steadfast
    VMware Virtual Data Center Platform

    karl @ steadfast.net - Sales/Support: 312-602-2689
    Cloud Hosting, Managed Dedicated Servers, Chicago Colocation, and New Jersey Colocation

  14. #14
    Join Date
    Mar 2003
    Location
    California
    Posts
    142
    The order of failure for us with the mostly likely to fail first is:

    - Drives
    - RAM
    - Power Supplies
    - Motherboard/RAID controller

    We use only 3Ware RAID controllers and we have only had one bad one. We have seen our rate of HD failures fall also with the switch from Maxtor to Western Digital RE Edition drives.

    Raphael Karunditu
    Ripple Web
    RippleWeb.com
    Dedicated Servers - Private Cloud Services - Colocation

  15. #15
    Join Date
    Apr 2005
    Location
    San Francisco, CA
    Posts
    1,031
    Hey Karl,

    Do you buy your SCSI's at newegg.com or some other directly suppliers from Taiwan ? We have many SCSI drives from HP/DELL - very low failure rate.

    Check your supplier.

    Steve

    Quote Originally Posted by KarlZimmer
    We have dozens of systems with RAID cards and I can't think of a single RAID card failure, we use Adaptec and 3ware cards. We had one that arrived bad, but as it was bad from the get-go it didn't cause any real issues, etc.

    You're saying that over the past 7-8 years, with 400+ drives, you've only had one drive failure?? Wow.. That is likely what is pulling it out of perspective, we generally see about a 1% drive failure on an annual basis, using mostly Seagate drives. Western Digital drives were closer to 2%, and SCSI drives were about 2.5-3% (Yes, we've had worse failure rates with SCSI drives than with SATA drives)

    We might be pickier than some about calling something a drive "failure" though...

  16. #16
    Wow I'd never had SCSI drives fail on me before, however I'm not a big user of it either. The most failing would be drives, then RAM, then power supply and then mobo/raid cards.
    ServerTweak Networks, LLC >> ServerTweak.com
    Experience the fastest network and superior servers, feel the power of ServerTweak!
    Fremont, CA DataCenter | Dedicated Servers | Colocation | Cross Connects HE.net | 1/4 - Full Cab Sales

  17. #17
    Join Date
    Feb 2002
    Location
    International
    Posts
    490
    Main failure has been with HDD and a few PSU and motherboards. Funnily enough more failures with SCSI drives than IDE and SATA drives. The SCSI drives give no warning signs and just pack in (screeching, literally). At least we see some smartcheck warnings with IDEs.
    Matthew - Burton Hosting
    low cost shared, reseller, VPS & dedicated solutions for over five years - we've got what you need.
    http://www.burtonhosting.com
    http://www.getmesupport.com - server monitoring service for all!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •