Results 1 to 11 of 11
  1. #1

    RAID1 - Failed drive?

    Hi,

    I'm just starting to learn this linux voodoo stuff and I found out that my software RAID1 setup has a failed drive (I think). I would appreciate if someone could confirm this and give me some advice.

    This is what I get when I run the "cat /var/mdstat/" command:
    Code:
    Personalities : [raid1]
    md0 : active raid1 sdb1[1] sda1[2](F)
          104320 blocks [2/1] [_U]
    
    md1 : active raid1 sdb2[1] sda2[2](F)
          4192896 blocks [2/1] [_U]
    
    md2 : active raid1 sdb3[1] sda3[2](F)
          484086528 blocks [2/1] [_U]
    
    unused devices: <none>
    I googled around and found that the F indicates that there's a failure and also this [_U] should look like [UU]. Are there any commands/checks I can do to make sure it's the case?

    Also found this in logs dated a few days ago:
    Code:
    kernel: md: syncing RAID array md0
    kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
    kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
    kernel: md: using 128k window, over a total of 104320 blocks.
    kernel: md: md0: sync done.
    kernel: md: syncing RAID array md1
    kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
    kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
    kernel: md: using 128k window, over a total of 4192896 blocks.
    kernel: md: md1: sync done.
    kernel: RAID1 conf printout:
    kernel:  --- wd:1 rd:2
    kernel:  disk 1, wo:0, o:1, dev:sdb1
    kernel: RAID1 conf printout:
    kernel:  --- wd:1 rd:2
    kernel:  disk 1, wo:0, o:1, dev:sdb2
    So what do you think? Is a drive dead? Shouldn't I get an email when there's a problem (I run CentOS with Plesk panel)? I've received emails regarding other matters but nothing about a drive failing.

    Any advice appreciated. Thanks.

  2. #2
    It's failed for sure. Try the smartctl command if you want to double check.

    If you're not getting an email, check your config files, not 100% sure of the names off the top of my head but try /etc/smartctl.conf and /etc/mdadm.conf

  3. #3
    I ran

    /usr/sbin/smartctl --all /dev/sdb

    and I got a lot of information about the drive and as much as I can tell everything is ok.

    But for sda I got: "A mandatory SMART command failed: exiting. ..."

    So I guess it's dead... What should I do? Should I panic? Cause I already did that. I have no idea what to do next.

  4. #4
    Join Date
    Apr 2005
    Location
    Raleigh, NC
    Posts
    816
    I'm not much help in resolving your issue but I would recommend a hardware RAID solution.
    Linux & Windows Hosting Expert @ Contegix
    FedRAMP / HIPAA / PCI Compliant Cloud Solutions backed by 24x7x365 Technical Support
    Specializing in high traffic Drupal and WordPress compliant sites + highly customized solutions
    rjohnson@contegix.com -- 877-289-0395 x2018 -- www.Contegix.com

  5. #5
    Join Date
    Aug 2006
    Location
    Ashburn VA, San Diego CA
    Posts
    4,615
    /dev/sda (first of the two disks) is failed based on the smartctl output. It'll have to be replaced, then you'll need to rebuild the array.
    Fast Serv Networks, LLC | AS29889 | DDOS Protected | Managed Cloud, Streaming, Dedicated Servers, Colo by-the-U
    Since 2003 - Ashburn VA + San Diego CA Datacenters

  6. #6
    Join Date
    Aug 2009
    Location
    Montreal
    Posts
    1,697
    Quote Originally Posted by Motiv View Post
    I'm not much help in resolving your issue but I would recommend a hardware RAID solution.
    Software raid 1 is perfectly fine.

    To the OP, I would suggest replacing the failed hdd as soon as possible. If you would like to do it yourself then this guide should help:
    http://www.howtoforge.com/replacing_..._a_raid1_array
    CrocWeb Cloud - High Availability Cloud Website Hosting
    > NVMe Storage, LSCache, Redis, Global CDN, Unlimited SSL
    > Triple Data Replication, Automated Server Failover
    > Bad Bots, Malware, DDoS Protection

  7. #7
    Join Date
    Jul 2009
    Location
    The backplane
    Posts
    1,788
    Quote Originally Posted by Motiv View Post
    I'm not much help in resolving your issue but I would recommend a hardware RAID solution.
    Complete waste of money in this scenario.

  8. #8
    Join Date
    Mar 2010
    Location
    JNB, ZA
    Posts
    93
    Quote Originally Posted by Motiv View Post
    I'm not much help in resolving your issue but I would recommend a hardware RAID solution.
    +1 - never did have much success with SW RAID on production servers.

  9. #9
    For simple RAID 1, software RAID is a better solution if you ask me. Cheaper and more flexible, and you don't have to worry about losing your data if your card breaks and you can't find an identical one.

    So I guess it's dead... What should I do? Should I panic? Cause I already did that. I have no idea what to do next.
    Panic for a couple more hours then remove the disk.

    Code:
    mdadm --manage /dev/md0 --remove /dev/sda1
    mdadm --manage /dev/md1 --remove /dev/sda2
    mdadm --manage /dev/md2 --remove /dev/sda3
    Put in a new drive (which will now be sda) and copy the partition layout

    Code:
    sfdisk -d /dev/sdb > layout.txt
    sfdisk /dev/sda < layout.txt
    Re-add the partitions

    Code:
    mdadm --manage /dev/md0 --add /dev/sda1
    mdadm --manage /dev/md1 --add /dev/sda2
    mdadm --manage /dev/md2 --add /dev/sda3

  10. #10
    Thanks for your help, guys. It didn't go as smooth as I hoped but at least everything ok now. I backed up my data, removed the failed drive from the array and contacted support to make the swap. Support told me the server didn't boot with the new drive (grub error) and the same without the failed one. So they asked for my details so they can have a look and managed to fix it.

    As for the hardware vs software raid.. my choice was made with budget in mind. It did what is supposed to, what more can I ask. Of course, it would have been better without the downtime but in my case the cost of 1 hour of downtime is far less than the cost of the hardware raid.

    Again, I appreciate your help.
    Last edited by coscip; 03-18-2011 at 10:41 PM.

  11. #11
    Join Date
    Mar 2003
    Location
    /root
    Posts
    23,991
    Moved > Hosting Security and Technology .

    Specially 4 U
    Reseller Hosting: Boost Your Websites | Fully Managed KVM VPS: 3.20 - 5.00 Ghz, Pure Dedicated Power
    JoneSolutions.Com is on the net 24/7 providing stable and reliable web hosting solutions, server management and services since 2001
    Debian|Ubuntu|cPanel|DirectAdmin|Enhance|Webuzo|Acronis|Estela|BitNinja|Nginx

Similar Threads

  1. Software RAID1 - how will I know which drive is failing?
    By chasebug in forum Dedicated Server
    Replies: 9
    Last Post: 02-24-2011, 12:36 PM
  2. Different hard drive models in RAID1
    By Nich in forum Dedicated Server
    Replies: 7
    Last Post: 12-01-2010, 10:33 AM
  3. RAID1 20bay drive ...how to do it?
    By peruviantalk in forum Computers and Peripherals
    Replies: 5
    Last Post: 01-24-2009, 08:49 PM
  4. How to tell if a drive has failed in an array?
    By matt2kjones in forum Hosting Security and Technology
    Replies: 1
    Last Post: 03-13-2005, 10:38 AM
  5. Hard drive failed....?
    By chunpal in forum Dedicated Server
    Replies: 4
    Last Post: 03-06-2003, 02:30 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •