What Does the Program Do?
This script interfaces with a storage management system (likely IBM Spectrum Scale or a similar system) to:
- Disk Health Monitoring: Identifies disks in “not OK” state, Detects disks marked for replacement and Provides detailed disk information including recovery group, state, and location.
- Replacement Preparation: Safely prepares disks for replacement to minimize data loss.
- Automated Operations: Automates disk replacement (with dry-run and preparation options).
- Detailed Logging: Comprehensive logging to track all operations.
- Email Notifications: Alerts system administrators about problematic disks.
- Formatted Output: Clean table presentation of disk status information.
- Performance Monitoring: Tracks execution time for operational efficiency
Technical Implementation
Dependencies and Configuration
import pandas as pd # Data manipulation and analysis
import subprocess # Execute shell commands
import json, time # Data serialization and timing operations
from datetime import datetime # Date and time handling
import logging # Logging operations
from logging.handlers import SysLogHandler # System logging
from docopt import docopt # Command-line argument parsing
from prettytable import PrettyTable # Formatted table output
import smtplib # Email functionality
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
logging.basicConfig(level=logging.INFO,
filename='logs.log',
filemode='a',
format='%(message)s %(asctime)s',
datefmt="%Y-%m-%d %T")
CLI Interface
docopt
library to provide a clean, well-documented command-line interface:
Usage:
pdisk.py --replace [--short]
pdisk.py --prepare [--short]
pdisk.py --dryrun [--short]
pdisk.py --email -e
pdisk.py --version
pdisk.py -h | --help
Disk Status Collection
subprocess.Popen
to gather information about disk status:
mmvdisk pdisk list --rg all --not-ok
mmvdisk pdisk list --rg all --replace
Data Processing
The output from system commands is processed using Pandas, a powerful data analysis library:
def get_failed_pdisk(filename, command):
# Process command output files
df = pd.read_csv(filename, sep='\s{2,}', engine='python')
return df[["recovery group", "pdisk"]]
Disk Replacement Operations
def replace_pdisk(args, pdisk, group, need_replace):
# Command to prepare or replace a disk
if args['--prepare']:
output_ = subprocess.Popen(['mmvdisk', 'pdisk', 'replace', '--prepare', '--rg', group, '--pdisk', pdisk], stdout=subprocess.PIPE)
# ...process output
else:
output_ = subprocess.Popen(['mmvdisk', 'pdisk', 'replace', '--recovery-group', group, '--pdisk', pdisk], stdout=subprocess.PIPE)
# ...process output
Notification System
def send_email(sender_email, sender_password, receiver_email, subject, message):
# Create and send email notifications
msg = MIMEMultipart()
# ... email configuration
with smtplib.SMTP("smtp.gmail.com", 587) as smtp:
smtp.starttls()
smtp.login(sender_email, sender_password)
smtp.send_message(msg)
Visualization and Output
def display_state(dataframe, t_info):
# Create and display tabular data
table = PrettyTable()
table.field_names = ["Name", "RecoveryGroup", "state", "location", "hardware", "User location", "Server"]
# ... populate and display table
Usage Examples
Checking Disks That Need Replacement
To identify disks that need replacement without making any changes:
python pdisk.py --dryrun
$ python pdisk.py --dryrun
+------------------+------------------------------------------------+
| Command: | mmvdisk pdisk list --rg all --not-ok |
+------------------+------------------------------------------------+
Disk not ok
Name RecoveryGroup state location hardware User location Server
pdisk1 rg0 failed bay3 SAS Rack-A01 node01
pdisk2 rg1 missing bay7 NVMe Rack-B04 node02
+------------------+------------------------------------------------+
| Command: | mmvdisk pdisk list --rg all --replace |
+------------------+------------------------------------------------+
List of replace disks
Name RecoveryGroup state location hardware User location Server
pdisk1 rg0 failed bay3 SAS Rack-A01 node01
pdisk2 rg1 missing bay7 NVMe Rack-B04 node02
DISKS NEEDS REPLACEMENT!
[{'name': 'pdisk1', 'recoveryGroup': 'rg0', 'state': 'failed', 'location': 'bay3', ...},
{'name': 'pdisk2', 'recoveryGroup': 'rg1', 'state': 'missing', 'location': 'bay7', ...}]
List of pdisk needs to be replaced:
Command: ['mmvdisk pdisk replace --prepare --rg rg0 --pdisk pdisk1', 'mmvdisk pdisk replace --prepare --rg rg1 --pdisk pdisk2']
[DRY-RUN] Would run:
mmvdisk pdisk replace --prepare --rg rg0 --pdisk pdisk1
mmvdisk pdisk replace --prepare --rg rg1 --pdisk pdisk2
The program took 0:00:45 to run.
Date and time program was initiated 2025-04-11,09:27 UTC
Preparing Disks for Replacement
python pdisk.py --prepare
$ python pdisk.py --prepare
+------------------+------------------------------------------------+
| Command: | mmvdisk pdisk list --rg all --not-ok |
+------------------+------------------------------------------------+
Disk not ok
Name RecoveryGroup state location hardware User location Server
pdisk1 rg0 failed bay3 SAS Rack-A01 node01
pdisk2 rg1 missing bay7 NVMe Rack-B04 node02
+------------------+------------------------------------------------+
| Command: | mmvdisk pdisk list --rg all --replace |
+------------------+------------------------------------------------+
List of replace disks
Name RecoveryGroup state location hardware User location Server
pdisk1 rg0 failed bay3 SAS Rack-A01 node01
pdisk2 rg1 missing bay7 NVMe Rack-B04 node02
DISKS NEEDS PREPARATION BEFORE REPLACEMENT!
[{'name': 'pdisk1', 'recoveryGroup': 'rg0', 'state': 'failed', 'location': 'bay3', ...},
{'name': 'pdisk2', 'recoveryGroup': 'rg1', 'state': 'missing', 'location': 'bay7', ...}]
Preparing disks for replacement:
Command: mmvdisk pdisk replace --prepare --rg rg0 --pdisk pdisk1
Command: mmvdisk pdisk replace --prepare --rg rg1 --pdisk pdisk2
Successfully prepared pdisk for replace!
Command: mmvdisk pdisk replace --prepare --rg rg0 --pdisk pdisk1 --> OUTPUT: Reinsert carrier.
Successfully prepared pdisk for replace!
Command: mmvdisk pdisk replace --prepare --rg rg1 --pdisk pdisk2 --> OUTPUT: Reinsert carrier.
Name RecoveryGroup state location hardware User location Server
pdisk1 rg0 failed bay3 SAS Rack-A01 node01
pdisk2 rg1 missing bay7 NVMe Rack-B04 node02
The program took 0:01:05 to run.
Date and time program was initiated 2025-04-11,09:34 UTC
Executing Disk Replacement
python pdisk.py --replace
Simulated output:
$ python pdisk.py --replace
+------------------+------------------------------------------------+
| Command: | mmvdisk pdisk list --rg all --not-ok |
+------------------+------------------------------------------------+
Disk not ok
Name RecoveryGroup state location hardware User location Server
pdisk1 rg0 failed bay3 SAS Rack-A01 node01
pdisk2 rg1 missing bay7 NVMe Rack-B04 node02
+------------------+------------------------------------------------+
| Command: | mmvdisk pdisk list --rg all --replace |
+------------------+------------------------------------------------+
List of replace disks
Name RecoveryGroup state location hardware User location Server
pdisk1 rg0 failed bay3 SAS Rack-A01 node01
pdisk2 rg1 missing bay7 NVMe Rack-B04 node02
DISKS NEEDS REPLACEMENT!
[{'name': 'pdisk1', 'recoveryGroup': 'rg0', 'state': 'failed', 'location': 'bay3', ...},
{'name': 'pdisk2', 'recoveryGroup': 'rg1', 'state': 'missing', 'location': 'bay7', ...}]
List of pdisk needs to be replaced:
Command: ['mmvdisk pdisk replace --prepare --rg rg0 --pdisk pdisk1', 'mmvdisk pdisk replace --prepare --rg rg1 --pdisk pdisk2']
Successfully prepared pdisk for replace!
Command: mmvdisk pdisk replace --prepare --rg rg0 --pdisk pdisk1 --> OUTPUT: Reinsert carrier.
Successfully prepared pdisk for replace!
Command: mmvdisk pdisk replace --prepare --rg rg1 --pdisk pdisk2 --> OUTPUT: Reinsert carrier.
Name RecoveryGroup state location hardware User location Server
pdisk1 rg0 failed bay3 SAS Rack-A01 node01
pdisk2 rg1 missing bay7 NVMe Rack-B04 node02
The program took 0:01:02 to run.
Date and time program was initiated 2025-04-11,09:24 UTC
Sending Notifications
python pdisk.py --email -e admin@example.com
Simulated output:
$ python pdisk.py --email -e example@example.com
... (disk check tables as above)
Sending email to: example@example.com
Email sent to Trial1 (example@example.com)
The program took 0:00:48 to run.
Date and time program was initiated 2025-04-11,09:29 UTC
Sample Log Output
Command: mmvdisk pdisk list --rg all --not-ok ---> Output: Disk not ok.
2025-04-11 09:32:15
Command: mmvdisk pdisk list --rg all --replace ---> Output: Disk list with replacement suggestions.
2025-04-11 09:32:17
List of pdisk needs to be replaced:
Command: mmvdisk pdisk list --rg all --replace
recovery group pdisk
0 rg0 pdisk1
1 rg1 pdisk2
2025-04-11 09:32:18
Successfully prepared pdisk for replace!
Command: mmvdisk pdisk replace --prepare --rg rg0 --pdisk pdisk1 --> OUTPUT: Reinsert carrier.
2025-04-11 09:32:23
Successfully prepared pdisk for replace!
Command: mmvdisk pdisk replace --prepare --rg rg1 --pdisk pdisk2 --> OUTPUT: Reinsert carrier.
2025-04-11 09:32:29
The program took 0:01:05 to run.
Date and time program was initiated 2025-04-11,09:31 UTC
2025-04-11 09:32:35
Security Considerations
Conclusion
This disk management tool demonstrates how Python can be used to automate complex system administration tasks. By combining system commands with data analysis and reporting capabilities, we’ve created a powerful utility that simplifies maintenance operations and improves reliability.
Whether you’re managing a small cluster or a large-scale storage infrastructure, automated tools like this can significantly reduce the operational burden and minimize the risk of human error during critical maintenance procedures.
This script saves hours of manual work and reduces the risk of overlooking critical disk failures in production. Feel free to adapt and expand it to fit your infrastructure needs.
You can find the full code on my GitHub or reach out if you’d like help adapting it for your environment.