Results 1 to 17 of 17
Thread: RAID failures the most common?
-
01-02-2007, 06:25 PM #1Junior Guru Wannabe
- Join Date
- Dec 2006
- Location
- Missoula,MT
- Posts
- 46
RAID failures the most common?
Hello,
More of a poll-type question here.
I've managed hundreds of servers in my days. Most of them had SCSI RAID configurations, mainly RAID 1 or 5. The most common failure I saw, on all of the machines, wasn't bad RAM, disk drive failures, power supplies and the like. I don't have exact figures, but I would estimate something like 75% of the hardware problems I've seen have been related to RAID controllers (card failures).
First, does anyone else have this experience also, or is this just some kind of strange bit of luck I've had over the years with RAID systems?
Second, if this is also common among others, how does that affect your server build concepts for HA/Enterprise systems, especially when specing such a solution out of dedicated servers?
Thanks
-
01-02-2007, 06:29 PM #2Eternal Member
- Join Date
- Dec 2004
- Location
- New York, NY
- Posts
- 10,710
Have you been using a specific RAID card primarily in your setups? If so, which?
-
01-02-2007, 06:31 PM #3Junior Guru Wannabe
- Join Date
- Aug 2006
- Posts
- 75
I've seen the opposite:
1.) Drives
2.) PSU's
3.) Memory
4.) Board Level (Mobo, proc, etc.)
5.) Raid Cards
That isn't to say when I think of the most painful failures I've worked with that RAID problems aren't #1. That bad card makes for 1000x the pain of a dead drive.Caro.Net: Support is everything
Offering High Quality Dell Dedicated Servers since 1995
-
01-02-2007, 06:44 PM #4Junior Guru Wannabe
- Join Date
- Dec 2006
- Location
- Missoula,MT
- Posts
- 46
Hrmm, not really any specific configuration
Originally Posted by layer0
Some of them had SEAGATE drives with Adaptec controllers, have used some others too but I'm not sure there was a significant majority manufacturer among them all. The controllers always worked when drive failures happened. The problem was always that when the card failed so did everything else.
I'm not talking about hundreds of failures here, probably something like 3 or 4 catastophic failures in the last 7 or 8 years on RAID controllers while I've had only 1 drive flake out and probably 1 or 2 power supplies go bad, out of a total number of approx. 200 or so machines for roughly 10 different companies. I never saw any combination of hardware failure on the same machine, it was always isolated to one specific failure.
-
01-02-2007, 08:28 PM #5Disabled
- Join Date
- Jan 2006
- Location
- San Diego
- Posts
- 1,103
I have seen a few caused by the ribbon cable, some new SATA II cables are failing too.
-
01-04-2007, 10:53 PM #6learning is in the doing
- Join Date
- Sep 2000
- Location
- Alberta, Canada
- Posts
- 3,146
Just went throught this situation with one of our DataCenters and a Server using RAID1. After new hard drive was installed it would not rebuild. Even went so far as trying 5 - 6 different hard drives and 2 - 3 different RAID cards. Nothing worked.
I can only presume that in this case, it was the Motherboard that was the problem. I never would have thought of the cable causing a problem?
Finally had to start from scratch with just a Primary and Backup hard drive to get the Server back online. Currently having a new Server built, again with RAID1, and everything will be pre-tested before transferring Clients over.
Anyone know if Seagate is a bad hard drive choice for RAID?• PotentProducts.com - for all your Hosting needs
• Helping people Host, Create and Maintain their Web Site
• ServerAdmin Services also available
-
01-04-2007, 11:32 PM #7Web Hosting Master
- Join Date
- Oct 2003
- Location
- Chicago, IL
- Posts
- 657
In response to the OP, I've seen the opposite. RAID card failures for me have extremely rare when compared against hard disk failures.
█ Zac Cogswell / CEI
█ Formerly known as WiredTree Zac
-
01-05-2007, 01:01 AM #8Web Hosting Master
- Join Date
- Feb 2002
- Posts
- 848
The only thing I noticed is that using both 3ware and adaptec raid cards, you'll tend to go through more drives as the raid cards detect more subtle timing errors and drop drives long before one would normally notice any error in a stand-alone situation.
-
01-05-2007, 02:28 AM #9Retired Moderator
- Join Date
- Mar 2004
- Location
- Singapore
- Posts
- 6,990
For me, I feel Raid card problems are very rare, I normally stick to 3ware cards. Most problematic are drives followed by PSU.
-
01-05-2007, 02:51 AM #10Aspiring Evangelist
- Join Date
- Feb 2005
- Location
- Montreal, PQ
- Posts
- 355
After using both software and hardware raid (mainly software) over the years, I've never had the raid chipset/card fail. Though I've had at least 6-8 drives die. Most of them were Maxtor/WD. At the moment all my servers are using Seagate drives, and not one of them has died since. But I did have 4 Seagate drives dead on arrival (USPS is crap!!!).
Servers proudly hosted at... WebNX (10) - Netelligent (6)
Also enjoying services from DreamHost, Hyperspin
(#) indicates number of servers - 16 total
-
01-05-2007, 01:39 PM #11Web Hosting Master
- Join Date
- Apr 2005
- Location
- San Francisco, CA
- Posts
- 1,031
Most chaos I had is RAID 5 (on DELL PowerEdge) with 4 Seagate Cheetah's - 2 HDD's got bad - 1 died and another is full of bad blocks - whole system get mess and I have to stop using RAID 5 setups after that because of the way it's using storage (If one drive is fail - whole arraw is useless).
RAID 1 and RAID 10 setups doing just fine so far.
-
01-05-2007, 01:59 PM #12Web Hosting Guru
- Join Date
- Nov 2006
- Location
- Amsterdam, NL
- Posts
- 250
From experience, hard drives seem to fail more than anything else. Raid 10 saves the day though.
••••• John Strong - SolidHost COO •••••
»» http://www.SolidHost.com ««
SolidHost offers fully managed Linux and Windows VPS's in the Netherlands , with Plesk, cPanel, Directadmin and Helm.
-
01-05-2007, 04:00 PM #13THE Web Hosting Master
- Join Date
- Jan 2003
- Location
- Chicago, IL
- Posts
- 6,957
We have dozens of systems with RAID cards and I can't think of a single RAID card failure, we use Adaptec and 3ware cards. We had one that arrived bad, but as it was bad from the get-go it didn't cause any real issues, etc.
You're saying that over the past 7-8 years, with 400+ drives, you've only had one drive failure?? Wow.. That is likely what is pulling it out of perspective, we generally see about a 1% drive failure on an annual basis, using mostly Seagate drives. Western Digital drives were closer to 2%, and SCSI drives were about 2.5-3% (Yes, we've had worse failure rates with SCSI drives than with SATA drives)
We might be pickier than some about calling something a drive "failure" though...Karl Zimmerman - Founder & CEO of Steadfast
VMware Virtual Data Center Platform
karl @ steadfast.net - Sales/Support: 312-602-2689
Cloud Hosting, Managed Dedicated Servers, Chicago Colocation, and New Jersey Colocation
-
01-05-2007, 04:56 PM #14WHT Addict
- Join Date
- Mar 2003
- Location
- California
- Posts
- 142
The order of failure for us with the mostly likely to fail first is:
- Drives
- RAM
- Power Supplies
- Motherboard/RAID controller
We use only 3Ware RAID controllers and we have only had one bad one. We have seen our rate of HD failures fall also with the switch from Maxtor to Western Digital RE Edition drives.
Raphael KarundituRipple Web
RippleWeb.com
Dedicated Servers - Private Cloud Services - Colocation
-
01-05-2007, 05:10 PM #15Web Hosting Master
- Join Date
- Apr 2005
- Location
- San Francisco, CA
- Posts
- 1,031
Hey Karl,
Do you buy your SCSI's at newegg.com or some other directly suppliers from Taiwan ? We have many SCSI drives from HP/DELL - very low failure rate.
Check your supplier.
Steve
Originally Posted by KarlZimmer
-
01-05-2007, 05:43 PM #16Junior Guru Wannabe
- Join Date
- Jun 2006
- Posts
- 67
Wow I'd never had SCSI drives fail on me before, however I'm not a big user of it either. The most failing would be drives, then RAM, then power supply and then mobo/raid cards.
ServerTweak Networks, LLC >> ServerTweak.com
Experience the fastest network and superior servers, feel the power of ServerTweak!
Fremont, CA DataCenter | Dedicated Servers | Colocation | Cross Connects HE.net | 1/4 - Full Cab Sales
-
01-05-2007, 07:56 PM #17Web Hosting Evangelist
- Join Date
- Feb 2002
- Location
- International
- Posts
- 490
Main failure has been with HDD and a few PSU and motherboards. Funnily enough more failures with SCSI drives than IDE and SATA drives. The SCSI drives give no warning signs and just pack in (screeching, literally). At least we see some smartcheck warnings with IDEs.
Matthew - Burton Hosting
low cost shared, reseller, VPS & dedicated solutions for over five years - we've got what you need.
http://www.burtonhosting.com
http://www.getmesupport.com - server monitoring service for all!