Results 1 to 11 of 11
  1. #1

    RAID1 - Failed drive?

    Hi,

    I'm just starting to learn this linux voodoo stuff and I found out that my software RAID1 setup has a failed drive (I think). I would appreciate if someone could confirm this and give me some advice.

    This is what I get when I run the "cat /var/mdstat/" command:
    Code:
    Personalities : [raid1]
    md0 : active raid1 sdb1[1] sda1[2](F)
          104320 blocks [2/1] [_U]
    
    md1 : active raid1 sdb2[1] sda2[2](F)
          4192896 blocks [2/1] [_U]
    
    md2 : active raid1 sdb3[1] sda3[2](F)
          484086528 blocks [2/1] [_U]
    
    unused devices: <none>
    I googled around and found that the F indicates that there's a failure and also this [_U] should look like [UU]. Are there any commands/checks I can do to make sure it's the case?

    Also found this in logs dated a few days ago:
    Code:
    kernel: md: syncing RAID array md0
    kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
    kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
    kernel: md: using 128k window, over a total of 104320 blocks.
    kernel: md: md0: sync done.
    kernel: md: syncing RAID array md1
    kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
    kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
    kernel: md: using 128k window, over a total of 4192896 blocks.
    kernel: md: md1: sync done.
    kernel: RAID1 conf printout:
    kernel:  --- wd:1 rd:2
    kernel:  disk 1, wo:0, o:1, dev:sdb1
    kernel: RAID1 conf printout:
    kernel:  --- wd:1 rd:2
    kernel:  disk 1, wo:0, o:1, dev:sdb2
    So what do you think? Is a drive dead? Shouldn't I get an email when there's a problem (I run CentOS with Plesk panel)? I've received emails regarding other matters but nothing about a drive failing.

    Any advice appreciated. Thanks.

  2. #2
    Join Date
    Jul 2009
    Location
    Indiana
    Posts
    2,193
    It's failed for sure. Try the smartctl command if you want to double check.

    If you're not getting an email, check your config files, not 100% sure of the names off the top of my head but try /etc/smartctl.conf and /etc/mdadm.conf
    Sam Barrow - CEO @ SQUIDIX (1-855-SQUIDIX)
    Ask Us About Sponsoring Your Web Site (High Traffic Sites Only)
    Squidix - Shared, Reseller, Semi-Dedicated, Managed VPS and Managed Dedicated Hosting
    Midwestern Web - Web Design & Development Services

  3. #3
    I ran

    /usr/sbin/smartctl --all /dev/sdb

    and I got a lot of information about the drive and as much as I can tell everything is ok.

    But for sda I got: "A mandatory SMART command failed: exiting. ..."

    So I guess it's dead... What should I do? Should I panic? Cause I already did that. I have no idea what to do next.

  4. #4
    Join Date
    Apr 2005
    Location
    Raleigh, NC
    Posts
    816
    I'm not much help in resolving your issue but I would recommend a hardware RAID solution.

  5. #5
    Join Date
    Aug 2006
    Location
    Ashburn VA, San Diego CA
    Posts
    4,571
    /dev/sda (first of the two disks) is failed based on the smartctl output. It'll have to be replaced, then you'll need to rebuild the array.
    Fast Serv Networks, LLC | AS29889 | Fully Managed Cloud, Streaming, Dedicated Servers, Colo by-the-U
    Since 2003 - Ashburn VA + San Diego CA Datacenters

  6. #6
    Join Date
    Aug 2009
    Location
    Montreal
    Posts
    1,606
    Quote Originally Posted by Motiv View Post
    I'm not much help in resolving your issue but I would recommend a hardware RAID solution.
    Software raid 1 is perfectly fine.

    To the OP, I would suggest replacing the failed hdd as soon as possible. If you would like to do it yourself then this guide should help:
    http://www.howtoforge.com/replacing_..._a_raid1_array
    CrocWeb :: Canadian Web Hosting
    Accelerate your website, maximum performance!
    www.crocweb.com :: Since 2009 (Montreal, Quebec)

  7. #7
    Join Date
    Jul 2009
    Location
    The backplane
    Posts
    1,790
    Quote Originally Posted by Motiv View Post
    I'm not much help in resolving your issue but I would recommend a hardware RAID solution.
    Complete waste of money in this scenario.

  8. #8
    Join Date
    Mar 2010
    Location
    JNB, ZA
    Posts
    92
    Quote Originally Posted by Motiv View Post
    I'm not much help in resolving your issue but I would recommend a hardware RAID solution.
    +1 - never did have much success with SW RAID on production servers.

  9. #9
    Join Date
    Jul 2009
    Location
    Indiana
    Posts
    2,193
    For simple RAID 1, software RAID is a better solution if you ask me. Cheaper and more flexible, and you don't have to worry about losing your data if your card breaks and you can't find an identical one.

    So I guess it's dead... What should I do? Should I panic? Cause I already did that. I have no idea what to do next.
    Panic for a couple more hours then remove the disk.

    Code:
    mdadm --manage /dev/md0 --remove /dev/sda1
    mdadm --manage /dev/md1 --remove /dev/sda2
    mdadm --manage /dev/md2 --remove /dev/sda3
    Put in a new drive (which will now be sda) and copy the partition layout

    Code:
    sfdisk -d /dev/sdb > layout.txt
    sfdisk /dev/sda < layout.txt
    Re-add the partitions

    Code:
    mdadm --manage /dev/md0 --add /dev/sda1
    mdadm --manage /dev/md1 --add /dev/sda2
    mdadm --manage /dev/md2 --add /dev/sda3
    Sam Barrow - CEO @ SQUIDIX (1-855-SQUIDIX)
    Ask Us About Sponsoring Your Web Site (High Traffic Sites Only)
    Squidix - Shared, Reseller, Semi-Dedicated, Managed VPS and Managed Dedicated Hosting
    Midwestern Web - Web Design & Development Services

  10. #10
    Thanks for your help, guys. It didn't go as smooth as I hoped but at least everything ok now. I backed up my data, removed the failed drive from the array and contacted support to make the swap. Support told me the server didn't boot with the new drive (grub error) and the same without the failed one. So they asked for my details so they can have a look and managed to fix it.

    As for the hardware vs software raid.. my choice was made with budget in mind. It did what is supposed to, what more can I ask. Of course, it would have been better without the downtime but in my case the cost of 1 hour of downtime is far less than the cost of the hardware raid.

    Again, I appreciate your help.
    Last edited by coscip; 03-18-2011 at 10:41 PM.

  11. #11
    Join Date
    Mar 2003
    Location
    WebHostingTalk
    Posts
    16,963
    Moved > Hosting Security and Technology .
    Specially 4 You
    .
    JoneSolutions.Com ( Jones.Solutions ) is on the net 24/7 providing stable and reliable web hosting solutions and services since 2001

Similar Threads

  1. Software RAID1 - how will I know which drive is failing?
    By chasebug in forum Dedicated Server
    Replies: 9
    Last Post: 02-24-2011, 12:36 PM
  2. Different hard drive models in RAID1
    By Nich in forum Dedicated Server
    Replies: 7
    Last Post: 12-01-2010, 10:33 AM
  3. RAID1 20bay drive ...how to do it?
    By peruviantalk in forum Computers and Peripherals
    Replies: 5
    Last Post: 01-24-2009, 08:49 PM
  4. How to tell if a drive has failed in an array?
    By matt2kjones in forum Hosting Security and Technology
    Replies: 1
    Last Post: 03-13-2005, 10:38 AM
  5. Hard drive failed....?
    By chunpal in forum Dedicated Server
    Replies: 4
    Last Post: 03-06-2003, 02:30 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •