티스토리 뷰
참조 : https://calomel.org/megacli_lsi_commands.html
making LSI raid controllers a little easier to work with
MegaCLI is the command line interface (CLI) binary used to communicate with the full LSI family of raid controllers found in Supermicro, DELL (PERC), ESXi and Intel servers. The program is a text based command line interface (CLI) and is comprised of a single static binary file. We are not a fan of graphical interfaces (GUI) and appreciate the control a command line program gives over a GUI solution. Using some simple shell scripting we can find out the health of the RAID, email ourselves about problems and work with failed drives.
There are many MegaCLI command pages which simply rehash the same commands over and over and we wanted to offer something more. For our examples we are using Ubuntu Linux and FreeBSD with the MegaCli64 binary. All of these same scripts and commands work for the 32bit and 64bit binaries.
Installing the MegaCLI binary
In order to communicate with the LSI card you will need the MegaCLI or MegaCLI64 (64bit) program. The install should be quite easy, but LSI make us jump through a few hoops. This is what we found:
- Go to the LSI Downloads page: LSI Downloads
- Search by keyword "megacli
- Click on "Management Software and Tools"
- Download the MegaCLI zip file. You will see the same file is for DOS, Windows, Linux and FreeBSD.
- Unzip the file
- In the Linux directory there is an RPM. If you are using Redhat you can install it. For Ubuntu got the next step.
- For Ubuntu run "rpm2cpio MegaCli-*.rpm | cpio -idmv" to expand the directory structure. You may need to "apt-get install rpm2cpio" .
- For FreeBSD unzip the file in the FreeBSD directory.
On our Ubuntu Linux 64bit and FreeBSD 64bit servers we simply copied MegaCli64 (64bit) to /usr/local/sbin/ . You can put the binary anywhere you want, but we choose /usr/local/sbin/ because it is in root's path. Make sure to secure the binary. Make the owner root and chmod the binary to 700 (chown root /usr/local/sbin/MegaCli64; chmod 700 /usr/local/sbin/MegaCli64). The install is now done. We would like to see LSI make a Ubuntu PPA or FreeBSD ports entry sometime in the future, but this setup was not too bad.
The lsi.sh MegaCLI interface script
Once you have MegaCLI installed, the following is a script to help in getting information from the raid card. The shell script does nothing more then execute the commands you normally use on the CLI. The script can show the status of the raid and drives. You can identify any drive slot by using the blinking light on the chassis. The script can help you identify drives which are starting to error out or slow down the raid so you can replace drives early. We have also included a "setdefaults" method to setup a new raid card to specs we use for our 400+ raids. Finally, use the "checkNemail" method to check the raid status and mail you with a list of drives and which one is reporting the problem.
You are welcome to copy and paste the following script. We call the script "lsi.sh", but you can use any name you wish. just make sure to set the full path to the MegaCli binary in the script and make the script executable. We tried to comment every method so take a look at the script before using it.
#!/bin/bash # # Calomel.org # https://calomel.org/megacli_lsi_commands.html # LSI MegaRaid CLI # lsi.sh @ Version 0.05 # # description: MegaCLI script to configure and monitor LSI raid cards. # Full path to the MegaRaid CLI binary MegaCli="/usr/local/sbin/MegaCli64" # The identifying number of the enclosure. Default for our systems is "8". Use # "MegaCli64 -PDlist -a0 | grep "Enclosure Device"" to see what your number # is and set this variable. ENCLOSURE="8" if [ $# -eq 0 ] then echo "" echo " OBPG .:. lsi.sh $arg1 $arg2" echo "-----------------------------------------------------" echo "status = Status of Virtual drives (volumes)" echo "drives = Status of hard drives" echo "ident \$slot = Blink light on drive (need slot number)" echo "good \$slot = Simply makes the slot \"Unconfigured(good)\" (need slot number)" echo "replace \$slot = Replace \"Unconfigured(bad)\" drive (need slot number)" echo "progress = Status of drive rebuild" echo "errors = Show drive errors which are non-zero" echo "bat = Battery health and capacity" echo "batrelearn = Force BBU re-learn cycle" echo "logs = Print card logs" echo "checkNemail = Check volume(s) and send email on raid errors" echo "allinfo = Print out all settings and information about the card" echo "settime = Set the raid card's time to the current system time" echo "setdefaults = Set preferred default settings for new raid setup" echo "" exit fi # General status of all RAID virtual disks or volumes and if PATROL disk check # is running. if [ $1 = "status" ] then $MegaCli -LDInfo -Lall -aALL -NoLog echo "###############################################" $MegaCli -AdpPR -Info -aALL -NoLog echo "###############################################" $MegaCli -LDCC -ShowProg -LALL -aALL -NoLog exit fi # Shows the state of all drives and if they are online, unconfigured or missing. if [ $1 = "drives" ] then $MegaCli -PDlist -aALL -NoLog | egrep 'Slot|state' | awk '/Slot/{if (x)print x;x="";}{x=(!x)?$0:x" -"$0;}END{print x;}' | sed 's/Firmware state://g' exit fi # Use to blink the light on the slot in question. Hit enter again to turn the blinking light off. if [ $1 = "ident" ] then $MegaCli -PdLocate -start -physdrv[$ENCLOSURE:$2] -a0 -NoLog logger "`hostname` - identifying enclosure $ENCLOSURE, drive $2 " read -p "Press [Enter] key to turn off light..." $MegaCli -PdLocate -stop -physdrv[$ENCLOSURE:$2] -a0 -NoLog exit fi # When a new drive is inserted it might have old RAID headers on it. This # method simply removes old RAID configs from the drive in the slot and make # the drive "good." Basically, Unconfigured(bad) to Unconfigured(good). We use # this method on our FreeBSD ZFS machines before the drive is added back into # the zfs pool. if [ $1 = "good" ] then # set Unconfigured(bad) to Unconfigured(good) $MegaCli -PDMakeGood -PhysDrv[$ENCLOSURE:$2] -a0 -NoLog # clear 'Foreign' flag or invalid raid header on replacement drive $MegaCli -CfgForeign -Clear -aALL -NoLog exit fi # Use to diagnose bad drives. When no errors are shown only the slot numbers # will print out. If a drive(s) has an error you will see the number of errors # under the slot number. At this point you can decided to replace the flaky # drive. Bad drives might not fail right away and will slow down your raid with # read/write retries or corrupt data. if [ $1 = "errors" ] then echo "Slot Number: 0"; $MegaCli -PDlist -aALL -NoLog | egrep -i 'error|fail|slot' | egrep -v ' 0' exit fi # status of the battery and the amount of charge. Without a working Battery # Backup Unit (BBU) most of the LSI read/write caching will be disabled # automatically. You want caching for speed so make sure the battery is ok. if [ $1 = "bat" ] then $MegaCli -AdpBbuCmd -aAll -NoLog exit fi # Force a Battery Backup Unit (BBU) re-learn cycle. This will discharge the # lithium BBU unit and recharge it. This check might take a few hours and you # will want to always run this in off hours. LSI suggests a battery relearn # monthly or so. We actually run it every three(3) months by way of a cron job. # Understand if your "Current Cache Policy" is set to "No Write Cache if Bad # BBU" then write-cache will be disabled during this check. This means writes # to the raid will be VERY slow at about 1/10th normal speed. NOTE: if the # battery is new (new bats should charge for a few hours before they register) # or if the BBU comes up and says it has no charge try powering off the machine # and restart it. This will force the LSI card to re-evaluate the BBU. Silly # but it works. if [ $1 = "batrelearn" ] then $MegaCli -AdpBbuCmd -BbuLearn -aALL -NoLog exit fi # Use to replace a drive. You need the slot number and may want to use the # "drives" method to show which drive in a slot is "Unconfigured(bad)". Once # the new drive is in the slot and spun up this method will bring the drive # online, clear any foreign raid headers from the replacement drive and set the # drive as a hot spare. We will also tell the card to start rebuilding if it # does not start automatically. The raid should start rebuilding right away # either way. NOTE: if you pass a slot number which is already part of the raid # by mistake the LSI raid card is smart enough to just error out and _NOT_ # destroy the raid drive, thankfully. if [ $1 = "replace" ] then logger "`hostname` - REPLACE enclosure $ENCLOSURE, drive $2 " # set Unconfigured(bad) to Unconfigured(good) $MegaCli -PDMakeGood -PhysDrv[$ENCLOSURE:$2] -a0 -NoLog # clear 'Foreign' flag or invalid raid header on replacement drive $MegaCli -CfgForeign -Clear -aALL -NoLog # set drive as hot spare $MegaCli -PDHSP -Set -PhysDrv [$ENCLOSURE:$2] -a0 -NoLog # show rebuild progress on replacement drive just to make sure it starts $MegaCli -PDRbld -ShowProg -PhysDrv [$ENCLOSURE:$2] -a0 -NoLog exit fi # Print all the logs from the LSI raid card. You can grep on the output. if [ $1 = "logs" ] then $MegaCli -FwTermLog -Dsply -aALL -NoLog exit fi # Use to query the RAID card and find the drive which is rebuilding. The script # will then query the rebuilding drive to see what percentage it is rebuilt and # how much time it has taken so far. You can then guess-ti-mate the # completion time. if [ $1 = "progress" ] then DRIVE=`$MegaCli -PDlist -aALL -NoLog | egrep 'Slot|state' | awk '/Slot/{if (x)print x;x="";}{x=(!x)?$0:x" -"$0;}END{print x;}' | sed 's/Firmware state://g' | egrep build | awk '{print $3}'` $MegaCli -PDRbld -ShowProg -PhysDrv [$ENCLOSURE:$DRIVE] -a0 -NoLog exit fi # Use to check the status of the raid. If the raid is degraded or faulty the # script will send email to the address in the $EMAIL variable. We normally add # this method to a cron job to be run every few hours so we are notified of any # issues. if [ $1 = "checkNemail" ] then EMAIL="raidadmin@localhost" # Check if raid is in good condition STATUS=`$MegaCli -LDInfo -Lall -aALL -NoLog | egrep -i 'fail|degrad|error'` # On bad raid status send email with basic drive information if [ "$STATUS" ]; then $MegaCli -PDlist -aALL -NoLog | egrep 'Slot|state' | awk '/Slot/{if (x)print x;x="";}{x=(!x)?$0:x" -"$0;}END{print x;}' | sed 's/Firmware state://g' | mail -s `hostname`' - RAID Notification' $EMAIL fi fi # Use to print all information about the LSI raid card. Check default options, # firmware version (FW Package Build), battery back-up unit presence, installed # cache memory and the capabilities of the adapter. Pipe to grep to find the # term you need. if [ $1 = "allinfo" ] then $MegaCli -AdpAllInfo -aAll -NoLog exit fi # Update the LSI card's time with the current operating system time. You may # want to setup a cron job to call this method once a day or whenever you # think the raid card's time might drift too much. if [ $1 = "settime" ] then $MegaCli -AdpGetTime -aALL -NoLog $MegaCli -AdpSetTime `date +%Y%m%d` `date +%H:%M:%S` -aALL -NoLog $MegaCli -AdpGetTime -aALL -NoLog exit fi # These are the defaults we like to use on the hundreds of raids we manage. You # will want to go through each option here and make sure you want to use them # too. These options are for speed optimization, build rate tweaks and PATROL # options. When setting up a new machine we simply execute the "setdefaults" # method and the raid is configured. You can use this on live raids too. if [ $1 = "setdefaults" ] then # Read Cache enabled specifies that all reads are buffered in cache memory. $MegaCli -LDSetProp -Cached -LAll -aAll -NoLog # Adaptive Read-Ahead if the controller receives several requests to sequential sectors $MegaCli -LDSetProp ADRA -LALL -aALL -NoLog # Hard Disk cache policy enabled allowing the drive to use internal caching too $MegaCli -LDSetProp EnDskCache -LAll -aAll -NoLog # Write-Back cache enabled $MegaCli -LDSetProp WB -LALL -aALL -NoLog # Continue booting with data stuck in cache. Set Boot with Pinned Cache Enabled. $MegaCli -AdpSetProp -BootWithPinnedCache -1 -aALL -NoLog # PATROL run every 672 hours or monthly (RAID6 77TB @60% rebuild takes 21 hours) $MegaCli -AdpPR -SetDelay 672 -aALL -NoLog # Check Consistency every 672 hours or monthly $MegaCli -AdpCcSched -SetDelay 672 -aALL -NoLog # Enable autobuild when a new Unconfigured(good) drive is inserted or set to hot spare $MegaCli -AdpAutoRbld -Enbl -a0 -NoLog # RAID rebuild rate to 60% (build quick before another failure) $MegaCli -AdpSetProp \{RebuildRate -60\} -aALL -NoLog # RAID check consistency rate to 60% (fast parity checks) $MegaCli -AdpSetProp \{CCRate -60\} -aALL -NoLog # Enable Native Command Queue (NCQ) on all drives $MegaCli -AdpSetProp NCQEnbl -aAll -NoLog # Sound alarm disabled (server room is too loud anyways) $MegaCli -AdpSetProp AlarmDsbl -aALL -NoLog # Use write-back cache mode even if BBU is bad. Make sure your machine is on UPS too. $MegaCli -LDSetProp CachedBadBBU -LAll -aAll -NoLog # Disable auto learn BBU check which can severely affect raid speeds OUTBBU=$(mktemp /tmp/output.XXXXXXXXXX) echo "autoLearnMode=1" > $OUTBBU $MegaCli -AdpBbuCmd -SetBbuProperties -f $OUTBBU -a0 -NoLog rm -rf $OUTBBU exit fi ### EOF ### |
How do I use the lsi.sh script ?
First, execute the script without any arguments. The script will print out the "help" statement showing all of the available commands and a very short description of the function. Inside the script you can also see we also put in detailed comments.
For example, lets look at the status of the RAID volumes or what LSI calls virtual drives. Run the script with the "status" argument. This will simply print the details of the raid drives and if PATROL or Check Consistency is running. In our example we have two(2) RAID6 volumes of 18.1TB each. The first array is "Partially Degraded" and the second is "Optimal" which means it is healthy.
calomel@lsi:~# ./lsi.sh status Adapter 0 -- Virtual Drive Information: Virtual Drive: 0 (Target Id: 0) Name : RAID Level : Primary-6, Secondary-0, RAID Level Qualifier-3 Size : 18.188 TB Sector Size : 512 Parity Size : 3.637 TB State : Partially Degraded Strip Size : 256 KB Number Of Drives : 12 Span Depth : 1 Default Cache Policy: WriteBack, ReadAdaptive, Cached, Write Cache OK if Bad BBU Current Cache Policy: WriteBack, ReadAdaptive, Cached, Write Cache OK if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write Disk Cache Policy : Enabled Encryption Type : None PI type: No PI Is VD Cached: No Virtual Drive: 1 (Target Id: 1) Name : RAID Level : Primary-6, Secondary-0, RAID Level Qualifier-3 Size : 18.188 TB Sector Size : 512 Parity Size : 3.637 TB State : Optimal Strip Size : 256 KB Number Of Drives : 12 Span Depth : 1 Default Cache Policy: WriteBack, ReadAdaptive, Cached, Write Cache OK if Bad BBU Current Cache Policy: WriteBack, ReadAdaptive, Cached, Write Cache OK if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write Disk Cache Policy : Enabled Encryption Type : None PI type: No PI Is VD Cached: No ###############################################
Adapter 0: Patrol Read Information: Patrol Read Mode: Auto Patrol Read Execution Delay: 672 hours Number of iterations completed: 2 Current State: Stopped Patrol Read on SSD Devices: Disabled Exit Code: 0x00 ###############################################
Check Consistency on VD #0 is not in progress. Check Consistency on VD #1 is not in progress. Exit Code: 0x00 |
Why is the first volume is degraded ?
The first virtual disk lost a drive, which was already replaced and is now rebuilding. We can look at the status of all the drives using the lsi.sh script and the "drives" argument. You can see slot number 9 is the drive which is rebuilding.
calomel@lsi:~# ./lsi.sh drives Slot Number: 0 - Online, Spun Up Slot Number: 1 - Online, Spun Up Slot Number: 2 - Online, Spun Up Slot Number: 3 - Online, Spun Up Slot Number: 4 - Online, Spun Up Slot Number: 5 - Online, Spun Up Slot Number: 6 - Online, Spun Up Slot Number: 7 - Online, Spun Up Slot Number: 8 - Online, Spun Up Slot Number: 9 - Rebuild Slot Number: 10 - Online, Spun Up Slot Number: 11 - Online, Spun Up Slot Number: 12 - Online, Spun Up Slot Number: 13 - Online, Spun Up Slot Number: 14 - Online, Spun Up Slot Number: 15 - Online, Spun Up Slot Number: 16 - Online, Spun Up Slot Number: 17 - Online, Spun Up Slot Number: 18 - Online, Spun Up Slot Number: 19 - Online, Spun Up Slot Number: 20 - Online, Spun Up Slot Number: 21 - Online, Spun Up Slot Number: 22 - Online, Spun Up Slot Number: 23 - Online, Spun Up |
calomel@lsi:~#./lsi.sh progress
Rebuild Progress on Device at Enclosure 8, Slot 9 Completed 32% in 169 Minutes. |
'Study > System' 카테고리의 다른 글
bash parameter expansion(변수 값 수정) (0) | 2017.08.10 |
---|---|
MegaCli LSI (0) | 2017.05.08 |
리눅스 배포판 버전 확인 (0) | 2017.05.08 |
Disk SATA 버전 및 제조사 확인 (0) | 2017.04.27 |
리눅스 시스템 시간 설정 (0) | 2017.04.21 |
- Total
- Today
- Yesterday
- bash modification
- 도커
- filesystem check
- editcap
- oracle 11gr2
- NX ASLR
- megacli
- dvwa
- cisco ssh
- webhack
- 윈도우 패스워드 복구
- dvwa_command
- dvwa_bruteforce
- history timestamp
- tshark
- tcpdstat
- mergecap
- text2pcap
- ${1##*.}
- pcapng
- recovery file on linux
- capinfos
- docker
- metasploitable3
- MySQL csv
- excel_aton
- 리눅스 버전
- bash parameter
- docker_dvwa
- ssl decrypt
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |