Results 1 to 10 of 10
Thread: 3Ware Daily RAID Health Report
-
02-15-2006, 04:54 AM #1Junior Guru Wannabe
- Join Date
- Jan 2004
- Location
- Alberta, Canada
- Posts
- 88
3Ware Daily RAID Health Report
Platform: Linux (CentOS 4.2 of at this writing)
Card: 3Ware Escalade 7506-x Series ATA RAID Controller
I'm really into piece of mind when it comes to my server. One of the critical peices is the RAID Controller. I wouldn't operate a mission critical solution without it.
This lil piece of code checks the status and alarm history of the RAID controller and emails it to me daily.
create a file in your /etc/cron.daily directory. We'll call it say "raidhealth"
Code:# 3Ware CLI RAID Report # Written for my Escalade 7506-4LP # # You will need the tw_cli file which you can download free # from www.3ware.com - save this file in your /sbin directory # # File is free for use by all but comes as is with no warranty # ################################## # variables that you need to set # ################################## email='you@yourdomain.com' # email where the reports are sent ################################## # do not edit below # ################################## hostname=`/bin/hostname` # pipe output from the 3Ware CLI to a text file /sbin/tw_cli info c0 > /tmp/raidhealth.txt echo "" >> /tmp/raidhealth.txt echo "" >> /tmp/raidhealth.txt /sbin/tw_cli alarms > /tmp/raidalarms.txt # combine that text to one file for reporting cat /tmp/raidhealth.txt /tmp/raidalarms.txt > /tmp/raidreport.txt # send me a copy of that output via email mail $email -s"3Ware RAID health report for $hostname" < /tmp/raidreport.txt # now clean up the text files we made rm -rf /tmp/raidhealth.txt rm -rf /tmp/raidalarms.txt rm -rf /tmp/raidreport.txt
Your daily output will look like:
Code:Controller: c0 ------------- Driver: 1.02.00.036 Model: 7506-4LP FW: FE7X 1.05.00.056 BIOS: BE7X 1.08.00.046 Monitor: ME7X 1.01.00.038 Serial #: B14001A3380518 PCB: Rev4 PCHIP: 1.30-66 ACHIP: 3.20 # of units: 1 Unit 0: RAID 5 152.66 GB ( 320168960 blocks): OK # of ports: 4 Port 0: Maxtor 6Y080L0 Y30VFXVE 76.33 GB (160086528 blocks): OK(unit 0) Port 1: Maxtor 6Y080L0 Y30VH33E 76.33 GB (160086528 blocks): OK(unit 0) Port 2: Maxtor 6Y080L0 Y30CPLSE 76.33 GB (160086528 blocks): OK(unit 0) Alarms Report for Controller /c0 Date Severity Alarm Message ----------------------------------------------------- No Alarms Found.
-
03-04-2006, 10:30 PM #2Aspiring Evangelist
- Join Date
- Oct 2002
- Posts
- 447
nice job
-
03-10-2006, 01:27 AM #3Junior Guru
- Join Date
- Oct 2002
- Posts
- 227
This will also work in most cases:
http://www.thetiredwebmaster.com/lin...at-raid-array/
I've had this running for a while now on a couple of systems with hardware RAID cards and it works very well.
Ryan
-
03-11-2006, 02:57 AM #4New Member
- Join Date
- Feb 2006
- Posts
- 4
great .. thanks very much
-
05-15-2006, 12:30 PM #5Newbie
- Join Date
- Apr 2006
- Posts
- 15
Thanks! This is exactly were I was looking for!
-
05-15-2006, 05:12 PM #6Web Hosting Guru
- Join Date
- Mar 2005
- Posts
- 297
3ware also provides an official tool they call 3DM. It's http based and can alert you of degraded arrays and such.
Q. What does 3ware's 3DM do?
3DM allows you to configure and monitor your storage remotely via a Web browser.
Management features include hot swap capability, error logging, remote configuration, version
details and rebuild pacing. 3DM also alerts you, via email, of critical events such as drive
failures
http://www.3ware.com/supportfaq/3ware_FAQ.pdf"It's hard to dance if you just lost your wallet. Whoa! Where's my wallet? But, hey this song is funky." - Mitch Hedberg
-
06-02-2006, 04:46 AM #7Junior Guru
- Join Date
- Sep 2004
- Location
- Sussex, England
- Posts
- 194
Whenever this script runs, I get the following:
/etc/cron.daily/raidhealth:
/etc/cron.daily/raidhealth: line 23: /sbin/tw_cli: No such file or directory
/etc/cron.daily/raidhealth: line 26: /sbin/tw_cli: No such file or directory
/etc/cron.daily/raidhealth: line 32: syntax error near unexpected token `newline'
/etc/cron.daily/raidhealth: line 32: `mail $email -s"3Ware RAID health report for $hostname" < '
-
06-02-2006, 06:27 AM #8Junior Guru Wannabe
- Join Date
- Jan 2004
- Location
- Alberta, Canada
- Posts
- 88
# You will need the tw_cli file which you can download free
# from www.3ware.com - save this file in your /sbin directory
http://www.3ware.com/support/downloa....4.asp?SNO=867
-
06-02-2006, 06:29 AM #9Junior Guru Wannabe
- Join Date
- Jan 2004
- Location
- Alberta, Canada
- Posts
- 88
..........................
-
03-12-2014, 07:11 PM #10New Member
- Join Date
- Mar 2014
- Posts
- 1
3ware RAID Monitoring with tw_cli
I know this is an old thread, given I'm still dealing with 3ware RAID equipment there might be others. Here is a program I wrote for actively monitoring 3ware RAID controllers.
Code:#!/usr/bin/perl -w # # tw_cli_raid_monitor.pl # Perl v5.8.8 # Tested under RHEL5, should work fine under RHEL6 # # This program uses tw_cli and smartctl to monitor the condition of 3ware RAID # controllers. It will send notification email out if the raid state changes. # Raid states OK and VERIFYING are considered good/optimal states. This program # is only set up to monitor one controller. If you have multiple raid arrays # on the same system you will need to modify the code, though its probably # easier to rename the program and run multiple instances of the same code with # different config settings. # # This program uses an xml config file that must have the same name as the # program and is expected to be in the same directory. # #tw_cli_raid_monitor.xml # < = <, > = > #- #<config> # <command_tw_cli>/bin/tw_cli</command_tw_cli> # <command_smartctl>/usr/sbin/smartctl</command_smartctl> # <command_date>/bin/date</command_date> # <command_sendmail>/usr/sbin/sendmail</command_sendmail> # <controller>c0</controller> # <unit>u0</unit> # <number_of_seconds_between_raid_checks>10</number_of_seconds_between_raid_checks> # <mail_to>admin.notice@mail_account.com</mail_to> # <mail_from>Server_Name 3ware RAID <root@Server_Name.com></mail_from> # <mail_subject_prefix>Server_Name 3ware RAID Status: </mail_subject_prefix> # <mail_subject_postfix></mail_subject_postfix> # <mail_x_mailer>Server_Name 3ware RAID</mail_x_mailer> # <mail_return_path>root@Server_Name.com</mail_return_path> #</config> #- # # This program runs continually, it is recommended that you run it in a session # preserving shell such as tmux or screen. Make sure the account you run from # has permissions to run tw_cli and smartctl, you may need to modify your # /etc/sudoers file to give the designated account permissions to run the listed # commands. You may need to modify the config as follows: # <command_tw_cli>/usr/bin/sudo /bin/tw_cli</command_tw_cli> # <command_smartctl>/usr/bin/sudo /usr/sbin/smartctl</command_smartctl> # # This script can be started at boot via crontab: # # @ReBoot /path/to/cron/script/tw_cli_raid_monitor.cron # #tw_cli_raid_monitor.cron #- ###!/bin/bash # ## To list tmux sessions use: ## tmux ls ## To Connect to tmux sessions use: ## tmux attach -t tw_cli_raid_monitor ## To detach from the tmux session use: ## (ctrl a) then press the d key # ## Set up the paths and environmental variables to run tmux. #source /home/user/.bashrc # ## Start the tmux session #/usr/bin/tmux new-session -d -s tw_cli_raid_monitor # ## Change to the working directory. #usr/bin/tmux send-keys -t tw_cli_raid_monitor "cd /path/to/cron/script/" C-m # ## Start the tw_cli_raid_monitor.pl process up in screen.. #/usr/bin/tmux send-keys -t tw_cli_raid_monitor "/path/to/cron/script/tw_cli_raid_monitor.pl" C-m #- # # History: # --------------------------------------------------------------------------- # 2013-12-12 dkienenberger Created. # ############################################################################# use XML::Simple; use File::Basename; #use Data::Dumper; #>>>>>>>>>>>>>>>>>>>>>>> # Check the raid status. #>>>>>>>>>>>>>>>>>>>>>>> sub get_raid_status { my $data = `$command_tw_cli info $controller $unit status | awk '{print \$4}'`; chomp $data; chomp $data; return $data; } #>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # Determine what we do depending on the condition of the raid state. #>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sub process_raid_state { my ($raid_status) = @_; no warnings 'exiting'; # Check if the last raid state is not optimal. if ( $Last_raid_state ne "OK" and $Last_raid_state ne "VERIFYING" ) { #If the raid state has not changed. if ( $Last_raid_state eq $raid_status ) { # Start check over again. next; } #If the raid state has changed. else { # if the raid state is optimal again. if ( $raid_status eq "OK" or $raid_status eq "VERIFYING" ) { # Notify the recipients that the raid state has returned to optimal. ¬ify_raid_state_change("short"); # Set record the changed raid state. $Last_raid_state = $raid_status; # Start check over again. next; } # if the raid state is still not optimal. else { # Notify the recipients that the raid state has changed and is still not optimal. ¬ify_raid_state_change("full"); # Set record the changed raid state. $Last_raid_state = $raid_status; # Start check over again. next; } } } #If the last raid state is optimal. else { #If the raid state has not changed. if ( $Last_raid_state eq $raid_status ) { # Start check over again. next; } #If the raid state has changed. else { # Check if the raid state is still optimal. if ( $raid_status eq "OK" or $raid_status eq "VERIFYING" ) { # Set record the changed raid state. $Last_raid_state = $raid_status; # Start check over again. next; } # if the raid state is not optimal. else { # Notify the recipients that the raid state has changed. ¬ify_raid_state_change("full"); # Set record the changed raid state. $Last_raid_state = $raid_status; # Start check over again. next; } } } } #>>>>>>>>>>>>>>>>>>>>>>> # Check the raid status. #>>>>>>>>>>>>>>>>>>>>>>> # takes argument "short" to send a short report. sub notify_raid_state_change { my ($message_type) = @_; my ($report,$smart_data); undef %{$port_and_serials}; if ( $message_type eq "short" ) { # Send a short email to recipients &send_email_notice(); } # Send a full email report to recipients else { # collect tw_cli logs and data. $report = &gather_tw_cli_report_data; #Get a list of bad devices port numbers and serial numbers. $port_and_serials = &identify_bad_drive; # Check and see if there are returned ports. if ($port_and_serials) { # collect smart data. $smart_data = &check_smart_data($port_and_serials); # Add the smart data to the report $report .= $smart_data; $report .= <<EOF; Reallocated_Event_Count - Tells us the disk had a sector glitch, but it was corrected. Nothing to worry about even if this is high. Current_Pending_Sector - Tells us the disk has a possible bad sector, its been noted and if its written to again it will either be noted as fixed (reallocated) or bad (uncorrectable). If this is very high you may want to worry. Cleanup can be force by taking the device off line and writing zeros to every sector on the device. Offline_Uncorrectable - This disk has a bad sector. Tells us if the drive may fail. I would replace a drive that has more than 5. If there is any errors in this category you should take the device off line and write zeros to every sector to see if more appear. Probably replace the disk. UDMA_CRC_Error_Count - Indicates a possible bad cable or a bad port on the controller card/motherboard. Shutdown the system and try replacing the cable connecting the device to the controller. EOF } # Send a full report email to recipients &send_email_notice($report); } } #>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # Get information for a full email report #>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sub gather_tw_cli_report_data { my $data_status = `$command_tw_cli info $controller`; my $data_alarms = `$command_tw_cli alarms`; return $data_status . "\n\n" . $data_alarms . "\n"; } #>>>>>>>>>>>>>>>>>>>>>>>>>>>> # Identify the bad drive(s). #>>>>>>>>>>>>>>>>>>>>>>>>>>>> sub identify_bad_drive { my $data = `$command_tw_cli info $controller drivestatus`; my @Drivestatus_output = split('\n', $data); my $TextLine; my ($junk,$port,$status,$serial,$port_number); my %Port = (); no warnings 'once'; # process each line of the drive status output foreach $TextLine @Drivestatus_output) { # process only port lines that start with p#. if ($TextLine =~ m#^p\d.+#) { # Break the line down into variables. ($port,$status,$junk,$junk,$junk,$serial) = split(' ', $TextLine); chomp $serial; # Check if the status of the current port is not optimal. if ( $status ne "OK" ) { # Remove the P prefix. ($junk,$port_number) = split(/p/, $port); # Store the port number and serial number of the bad device. $Port{ $port_number } = $serial; } } } #Return a reference to the port hash. return \%Port; } #>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # Retrieve smart data from the device. #>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sub check_smart_data { my $port = shift; my $data; # go though the bad ports. while ( my ($key, $value) = each(%{$port}) ) { #Retrieve smart data. $data .= " Port: $key\n Device Serial Number: $value\n"; $data .= `$command_smartctl -A -d 3ware,$key /dev/twa0`; $data .= "\n\n"; } return $data; } #>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> # Send out the email to the recipients. #>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sub send_email_notice { my ($report) = @_; no warnings 'uninitialized'; # if there is a report, send a full email report. if ($report) { # Send full report to recipients open (MAIL, "|$command_sendmail -t") || die "Can't open the mail binary: ${command_sendmail}!\n"; print MAIL "X-Mailer: ${mail_x_mailer}\n"; print MAIL "Return-Path: ${mail_return_path}\n"; print MAIL "From: ${mail_from}\n"; print MAIL "To: ${mail_to}\n"; print MAIL "Subject: ${mail_subject_prefix}${Current_raid_state}${mail_subject_postfix}\n\n"; print MAIL "${report}\n\n"; close (MAIL); } # if the report is empty. else { # Send short notice to recipients open (MAIL, "|$command_sendmail -t") || die "Can't open the mail binary: ${command_sendmail}!\n"; print MAIL "X-Mailer: ${mail_x_mailer}\n"; print MAIL "Return-Path: ${mail_return_path}\n"; print MAIL "From: ${mail_from}\n"; print MAIL "To: ${mail_to}\n"; print MAIL "Subject: ${mail_subject_prefix}${Current_raid_state}${mail_subject_postfix}\n\n"; print MAIL "\n"; close (MAIL); } } # Run the main routine. &Start; #>>>>>>>>>>>>> #> Main Start #>>>>>>>>>>>>> sub Start { #declare all global variables used in this subroutine. local ($TimeStamp, $Last_raid_state, $Current_raid_state, $command_tw_cli ,$command_smartctl ,$command_date ,$command_sendmail ,$controller ,$unit ,$check_time ,$mail_to ,$mail_from ,$mail_subject_prefix ,$mail_subject_postfix ,$mail_x_mailer ,$mail_return_path); # Obtain configuration from a file with a name inferred from this # script's name. my ($program_name) = split(/\./, basename($0)); my $program_full_name =basename($0); # Load the config file using xmlsimple. $config = XMLin("${program_name}.xml", SuppressEmpty => 'undef'); $command_tw_cli = $config->{'command_tw_cli'}; $command_smartctl = $config->{'command_smartctl'}; $command_date = $config->{'command_date'}; $command_sendmail = $config->{'command_sendmail'}; $controller = $config->{'controller'}; $unit = $config->{'unit'}; $check_time = $config->{'number_of_seconds_between_raid_checks'}; $mail_to = $config->{'mail_to'}; $mail_from = $config->{'mail_from'}; $mail_subject_prefix = $config->{'mail_subject_prefix'}; $mail_subject_postfix = $config->{'mail_subject_postfix'}; $mail_x_mailer = $config->{'mail_x_mailer'}; $mail_return_path = $config->{'mail_return_path'}; # Get the date timestamp $TimeStamp = `$command_date`; chomp $TimeStamp; # Print other start info: print "-- Starting $program_full_name $TimeStamp --\n"; print "Loaded config file ${program_name}.xml\n"; # Run till killed. while (1) { #Sleep in seconds before running the check again. sleep ($check_time); # Get the status of the 3ware RAID. $Current_raid_state = &get_raid_status; # Define the first run of Last_raid_state. if (!$Last_raid_state) { $Last_raid_state = $Current_raid_state; } # Get the date timestamp $TimeStamp = `$command_date "+%Y%m%d %T"`; chomp $TimeStamp; # Output the current status to standard out. print "$TimeStamp: Raid Status: $Current_raid_state\n"; # Process the raid state. &process_raid_state($Current_raid_state); } #End the program exit; }