Results 1 to 10 of 10
  1. #1
    Join Date
    Jan 2004
    Location
    Alberta, Canada
    Posts
    88

    3Ware Daily RAID Health Report

    Platform: Linux (CentOS 4.2 of at this writing)
    Card: 3Ware Escalade 7506-x Series ATA RAID Controller

    I'm really into piece of mind when it comes to my server. One of the critical peices is the RAID Controller. I wouldn't operate a mission critical solution without it.

    This lil piece of code checks the status and alarm history of the RAID controller and emails it to me daily.

    create a file in your /etc/cron.daily directory. We'll call it say "raidhealth"

    Code:
    # 3Ware CLI RAID Report 
    # Written for my Escalade 7506-4LP 
    # 
    # You will need the tw_cli file which you can download free 
    # from www.3ware.com - save this file in your /sbin directory 
    # 
    # File is free for use by all but comes as is with no warranty 
    # 
    
    ################################## 
    # variables that you need to set # 
    ################################## 
    
    email='you@yourdomain.com'    # email where the reports are sent 
    
    ################################## 
    #      do not edit below         # 
    ################################## 
    
    hostname=`/bin/hostname` 
    
    # pipe output from the 3Ware CLI to a text file 
    /sbin/tw_cli info c0 > /tmp/raidhealth.txt 
    echo "" >> /tmp/raidhealth.txt 
    echo "" >> /tmp/raidhealth.txt 
    /sbin/tw_cli alarms > /tmp/raidalarms.txt 
    
    # combine that text to one file for reporting 
    cat /tmp/raidhealth.txt /tmp/raidalarms.txt > /tmp/raidreport.txt 
    
    # send me a copy of that output via email 
    mail $email -s"3Ware RAID health report for $hostname" < /tmp/raidreport.txt 
    
    # now clean up the text files we made 
    rm -rf /tmp/raidhealth.txt 
    rm -rf /tmp/raidalarms.txt 
    rm -rf /tmp/raidreport.txt
    Save and quit. chmod 755 this file. Now you'll get daily reports from your server about your 3Ware RAID Controller.

    Your daily output will look like:
    Code:
    Controller: c0 
    ------------- 
    Driver: 1.02.00.036 
    Model: 7506-4LP 
    FW: FE7X 1.05.00.056 
    BIOS: BE7X 1.08.00.046 
    Monitor: ME7X 1.01.00.038 
    Serial #: B14001A3380518 
    PCB: Rev4 
    PCHIP: 1.30-66 
    ACHIP: 3.20 
    
    
    # of units: 1 
    Unit 0: RAID 5 152.66 GB ( 320168960 blocks): OK 
    
    # of ports: 4 
    Port 0: Maxtor 6Y080L0 Y30VFXVE 76.33 GB (160086528 blocks): OK(unit 0) 
    Port 1: Maxtor 6Y080L0 Y30VH33E 76.33 GB (160086528 blocks): OK(unit 0) 
    Port 2: Maxtor 6Y080L0 Y30CPLSE 76.33 GB (160086528 blocks): OK(unit 0) 
    
    
    
    Alarms Report for Controller /c0 
    Date Severity Alarm Message 
    ----------------------------------------------------- 
    No Alarms Found.
    Nothing fancy, but I like it.

  2. #2
    nice job

  3. #3
    Join Date
    Oct 2002
    Posts
    227
    This will also work in most cases:

    http://www.thetiredwebmaster.com/lin...at-raid-array/

    I've had this running for a while now on a couple of systems with hardware RAID cards and it works very well.

    Ryan

  4. #4
    great .. thanks very much

  5. #5
    Thanks! This is exactly were I was looking for!

  6. #6
    3ware also provides an official tool they call 3DM. It's http based and can alert you of degraded arrays and such.

    Q. What does 3ware's 3DM do?
    3DM allows you to configure and monitor your storage remotely via a Web browser.
    Management features include hot swap capability, error logging, remote configuration, version
    details and rebuild pacing. 3DM also alerts you, via email, of critical events such as drive
    failures

    http://www.3ware.com/supportfaq/3ware_FAQ.pdf
    "It's hard to dance if you just lost your wallet. Whoa! Where's my wallet? But, hey this song is funky." - Mitch Hedberg

  7. #7
    Join Date
    Sep 2004
    Location
    Sussex, England
    Posts
    194
    Whenever this script runs, I get the following:

    /etc/cron.daily/raidhealth:

    /etc/cron.daily/raidhealth: line 23: /sbin/tw_cli: No such file or directory
    /etc/cron.daily/raidhealth: line 26: /sbin/tw_cli: No such file or directory
    /etc/cron.daily/raidhealth: line 32: syntax error near unexpected token `newline'
    /etc/cron.daily/raidhealth: line 32: `mail $email -s"3Ware RAID health report for $hostname" < '
    I assume there is some required software missing, any pointers?

  8. #8
    Join Date
    Jan 2004
    Location
    Alberta, Canada
    Posts
    88
    # You will need the tw_cli file which you can download free
    # from www.3ware.com - save this file in your /sbin directory

    http://www.3ware.com/support/downloa....4.asp?SNO=867

  9. #9
    Join Date
    Jan 2004
    Location
    Alberta, Canada
    Posts
    88
    ..........................

  10. #10

    3ware RAID Monitoring with tw_cli

    I know this is an old thread, given I'm still dealing with 3ware RAID equipment there might be others. Here is a program I wrote for actively monitoring 3ware RAID controllers.

    Code:
    #!/usr/bin/perl -w
    #
    # tw_cli_raid_monitor.pl
    # Perl v5.8.8
    # Tested under RHEL5, should work fine under RHEL6
    #
    # This program uses tw_cli and smartctl to monitor the condition of 3ware RAID
    # controllers. It will send notification email out if the raid state changes.
    # Raid states OK and VERIFYING are considered good/optimal states. This program
    # is only set up to monitor one controller. If you have multiple raid arrays
    # on the same system you will need to modify the code, though its probably
    # easier to rename the program and run multiple instances of the same code with
    # different config settings.
    #
    # This program uses an xml config file that must have the same name as the
    # program and is expected to be in the same directory.
    #
    #tw_cli_raid_monitor.xml
    # &lt; = <, &gt; = >
    #-
    #<config>
    # <command_tw_cli>/bin/tw_cli</command_tw_cli>
    # <command_smartctl>/usr/sbin/smartctl</command_smartctl>
    # <command_date>/bin/date</command_date>
    # <command_sendmail>/usr/sbin/sendmail</command_sendmail>
    # <controller>c0</controller>
    # <unit>u0</unit>
    # <number_of_seconds_between_raid_checks>10</number_of_seconds_between_raid_checks>
    # <mail_to>admin.notice@mail_account.com</mail_to>
    # <mail_from>Server_Name 3ware RAID &lt;root@Server_Name.com&gt;</mail_from>
    # <mail_subject_prefix>Server_Name 3ware RAID Status: </mail_subject_prefix>
    # <mail_subject_postfix></mail_subject_postfix>    
    # <mail_x_mailer>Server_Name 3ware RAID</mail_x_mailer>
    # <mail_return_path>root@Server_Name.com</mail_return_path>
    #</config>
    #-
    #
    # This program runs continually, it is recommended that you run it in a session
    # preserving shell such as tmux or screen. Make sure the account you run from 
    # has permissions to run tw_cli and smartctl, you may need to modify your 
    # /etc/sudoers file to give the designated account permissions to run the listed
    # commands. You may need to modify the config as follows:
    # <command_tw_cli>/usr/bin/sudo /bin/tw_cli</command_tw_cli>
    # <command_smartctl>/usr/bin/sudo /usr/sbin/smartctl</command_smartctl>
    #
    # This script can be started at boot via crontab:
    #
    #  @ReBoot /path/to/cron/script/tw_cli_raid_monitor.cron
    # 
    #tw_cli_raid_monitor.cron
    #-
    ###!/bin/bash
    #
    ## To list tmux sessions use:
    ##    tmux ls
    ## To Connect to tmux sessions use:
    ##    tmux attach -t tw_cli_raid_monitor
    ## To detach from the tmux session use:
    ##    (ctrl a) then press the d key
    #
    ## Set up the paths and environmental variables to run tmux.
    #source /home/user/.bashrc
    #
    ## Start the tmux session
    #/usr/bin/tmux new-session -d -s tw_cli_raid_monitor
    #
    ## Change to the working directory.
    #usr/bin/tmux send-keys -t tw_cli_raid_monitor "cd /path/to/cron/script/" C-m
    #
    ## Start the tw_cli_raid_monitor.pl process up in screen..
    #/usr/bin/tmux send-keys -t tw_cli_raid_monitor "/path/to/cron/script/tw_cli_raid_monitor.pl" C-m
    #-
    #
    # History:
    # ---------------------------------------------------------------------------
    # 2013-12-12  dkienenberger    Created.
    #
    #############################################################################
    
    
    
    use XML::Simple;
    use File::Basename;
    #use Data::Dumper;
    
    
    #>>>>>>>>>>>>>>>>>>>>>>>
    # Check the raid status.
    #>>>>>>>>>>>>>>>>>>>>>>>
    
    sub get_raid_status {
      my $data = `$command_tw_cli info $controller $unit status | awk '{print \$4}'`;
      chomp $data;
      chomp $data;
      return $data;
    }  
    
    #>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    # Determine what we do depending on the condition of the raid state.
    #>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    
    sub process_raid_state {
      my ($raid_status) = @_;
      no warnings 'exiting';
        
      # Check if the last raid state is not optimal. 
      if ( $Last_raid_state ne "OK" and $Last_raid_state ne "VERIFYING" ) {  
        #If the raid state has not changed.
        if ( $Last_raid_state eq $raid_status ) {
        
          # Start check over again.
          next;
        
        }
        
        #If the raid state has changed.
        else {
        
    
          # if the raid state is optimal again.
          if ( $raid_status eq "OK" or $raid_status eq "VERIFYING" ) {
    
            # Notify the recipients that the raid state has returned to optimal.
            &notify_raid_state_change("short");
            
            
            # Set record the changed raid state.
            $Last_raid_state = $raid_status;
    
            # Start check over again.
            next;
          
          }
          
          # if the raid state is still not optimal.
          else {
          
            # Notify the recipients that the raid state has changed and is still not optimal.
            &notify_raid_state_change("full");
            
            # Set record the changed raid state.
            $Last_raid_state = $raid_status;
    
            # Start check over again.
            next;
          }
        }
      }
    
      #If the last raid state is optimal. 
      else {
      
        #If the raid state has not changed.
        if ( $Last_raid_state eq $raid_status ) {
        
        # Start check over again.
        next;
        
        }
        
        #If the raid state has changed.
        else {
        
    
          # Check if the raid state is still optimal.
          if ( $raid_status eq "OK" or $raid_status eq "VERIFYING" ) {
          
            # Set record the changed raid state.
            $Last_raid_state = $raid_status;
    
            # Start check over again.
            next;
          
          }
          
          # if the raid state is not optimal.
          else {
          
            # Notify the recipients that the raid state has changed.
            &notify_raid_state_change("full");
            
            # Set record the changed raid state.
            $Last_raid_state = $raid_status;
    
            # Start check over again.
            next;
          }
        }
      }
    }
    
    #>>>>>>>>>>>>>>>>>>>>>>>
    # Check the raid status.
    #>>>>>>>>>>>>>>>>>>>>>>>
    
    # takes argument "short" to send a short report.
    sub notify_raid_state_change {
      my ($message_type) = @_;
      my ($report,$smart_data);
      undef %{$port_and_serials};
      
      if ( $message_type eq "short" ) {
      
        # Send a short email to recipients
        &send_email_notice();
        
      }
      
      # Send a full email report to recipients
      else {
      
        # collect tw_cli logs and data.
        $report = &gather_tw_cli_report_data;
    
        #Get a list of bad devices port numbers and serial numbers.
        $port_and_serials = &identify_bad_drive;
    
        # Check and see if there are returned ports.
        if ($port_and_serials) {
        
           # collect smart data.
           $smart_data = &check_smart_data($port_and_serials);
         
           # Add the smart data to the report
           $report .= $smart_data;
           $report .= <<EOF;
    Reallocated_Event_Count - Tells us the disk had a sector glitch, but it was corrected. Nothing to worry about even if this is high.
    Current_Pending_Sector - Tells us the disk has a possible bad sector, its been noted and if its written to again it will either be noted as fixed (reallocated) or bad (uncorrectable). If this is very high you may want to worry. Cleanup can be force by taking the device off line and writing zeros to every sector on the device.
    Offline_Uncorrectable - This disk has a bad sector. Tells us if the drive may fail. I would replace a drive that has more than 5. If there is any errors in this category you should take the device off line and write zeros to every sector to see if more appear. Probably replace the disk.
    UDMA_CRC_Error_Count - Indicates a possible bad cable or a bad port on the controller card/motherboard. Shutdown the system and try replacing the cable connecting the device to the controller.
    EOF
    
        }
      
      # Send a full report email to recipients
      &send_email_notice($report);
      
      }
    }
      
    #>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    # Get information for a full email report
    #>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    
    sub gather_tw_cli_report_data {
      my $data_status = `$command_tw_cli info $controller`;
      my $data_alarms = `$command_tw_cli alarms`;
      
      return $data_status . "\n\n" . $data_alarms . "\n";
    }
    
    #>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    # Identify the bad drive(s).
    #>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    
    sub identify_bad_drive {
      my $data = `$command_tw_cli info $controller drivestatus`;
      my  @Drivestatus_output = split('\n', $data);
      my $TextLine;
      my ($junk,$port,$status,$serial,$port_number);
      my %Port = ();
      no warnings 'once';
      
      # process each line of the drive status output
      foreach $TextLine   @Drivestatus_output) {
    
        # process only port lines that start with p#.
        if ($TextLine =~ m#^p\d.+#) {
    
          # Break the line down into variables.
          ($port,$status,$junk,$junk,$junk,$serial) = split(' ', $TextLine);
          chomp $serial;
    
          # Check if the status of the current port is not optimal.
          if ( $status ne "OK" ) {
        
            # Remove the P prefix.
            ($junk,$port_number) = split(/p/, $port);  
    
            # Store the port number and serial number of the bad device.
            $Port{ $port_number } = $serial;     
          }
        }
      }
      
      #Return a reference to the port hash.
      return \%Port;
    }
    
    #>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    # Retrieve smart data from the device.
    #>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    
    sub check_smart_data {
      my $port  = shift;
      my $data;
    
      # go though the bad ports.
      while ( my ($key, $value) = each(%{$port}) ) {
        
        #Retrieve smart data.
        $data .= "  Port: $key\n  Device Serial Number: $value\n";
        $data .= `$command_smartctl -A -d 3ware,$key /dev/twa0`;
        $data .= "\n\n";
      }
      
      return $data;
    
    }
    
    
    #>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    # Send out the email to the recipients.
    #>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    
    sub send_email_notice {
    	my ($report) = @_;
    	no warnings 'uninitialized';
    	
    	# if there is a report, send a full email report.
    	if ($report) {
    	
    	  # Send full report to recipients
    	  
    	  open (MAIL, "|$command_sendmail -t") || die "Can't open the mail binary: ${command_sendmail}!\n";
    	  print MAIL "X-Mailer: ${mail_x_mailer}\n";
    	  print MAIL "Return-Path: ${mail_return_path}\n";
    	  print MAIL "From: ${mail_from}\n";
    	  print MAIL "To: ${mail_to}\n";
    	  print MAIL "Subject: ${mail_subject_prefix}${Current_raid_state}${mail_subject_postfix}\n\n";
    	  print MAIL "${report}\n\n";
    	  close (MAIL);
    	}
    	
    	# if the report is empty.
    	else {
    	
          # Send short notice to recipients
    
    	  open (MAIL, "|$command_sendmail -t") || die "Can't open the mail binary: ${command_sendmail}!\n";
    	  print MAIL "X-Mailer: ${mail_x_mailer}\n";
    	  print MAIL "Return-Path: ${mail_return_path}\n";
    	  print MAIL "From: ${mail_from}\n";
    	  print MAIL "To: ${mail_to}\n";
    	  print MAIL "Subject: ${mail_subject_prefix}${Current_raid_state}${mail_subject_postfix}\n\n";
    	  print MAIL "\n";
    	  close (MAIL);
    	}
    
    }
    
    
    
    
    
    # Run the main routine.
    &Start;
    
    #>>>>>>>>>>>>>
    #> Main Start 
    #>>>>>>>>>>>>>
    
    sub Start {
    
        #declare all global variables used in this subroutine.
        local ($TimeStamp, $Last_raid_state, $Current_raid_state, $command_tw_cli ,$command_smartctl ,$command_date ,$command_sendmail ,$controller ,$unit ,$check_time ,$mail_to ,$mail_from ,$mail_subject_prefix ,$mail_subject_postfix ,$mail_x_mailer ,$mail_return_path);
    
        # Obtain configuration from a file with a name inferred from this
        # script's name.
        my ($program_name) = split(/\./, basename($0));
        my $program_full_name =basename($0);
        
        # Load the config file using xmlsimple.
        $config = XMLin("${program_name}.xml", SuppressEmpty => 'undef');
    
        $command_tw_cli        = $config->{'command_tw_cli'};
        $command_smartctl      = $config->{'command_smartctl'};
        $command_date          = $config->{'command_date'};
        $command_sendmail      = $config->{'command_sendmail'};
        $controller            = $config->{'controller'};
        $unit                  = $config->{'unit'};
        $check_time            = $config->{'number_of_seconds_between_raid_checks'};
        $mail_to               = $config->{'mail_to'};
        $mail_from             = $config->{'mail_from'};
        $mail_subject_prefix   = $config->{'mail_subject_prefix'};
        $mail_subject_postfix  = $config->{'mail_subject_postfix'};
        $mail_x_mailer         = $config->{'mail_x_mailer'};
        $mail_return_path      = $config->{'mail_return_path'};
        
        
        # Get the date timestamp
        $TimeStamp = `$command_date`;
        chomp $TimeStamp;
    
        # Print other start info:
        print "-- Starting $program_full_name $TimeStamp --\n";
        print "Loaded config file ${program_name}.xml\n";
    
        
        # Run till killed.
        while (1) {
        
     	  #Sleep in seconds before running the check again.
          sleep ($check_time);
    
          # Get the status of the 3ware RAID.
          $Current_raid_state = &get_raid_status;
          
          # Define the first run of Last_raid_state.
          if (!$Last_raid_state) {
            $Last_raid_state = $Current_raid_state;
          }
    
          # Get the date timestamp
          $TimeStamp = `$command_date "+%Y%m%d %T"`;
          chomp $TimeStamp;
          
          # Output the current status to standard out.
          print "$TimeStamp: Raid Status: $Current_raid_state\n";
         
          # Process the raid state.
          &process_raid_state($Current_raid_state);
    
        }
    
       	
    	#End the program
    	exit;
    }

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •