grepCCLog.pl
Analyze ClusterCockpit Log Files for Running Jobs
Categories:
The grepCCLog.pl script analyzes ClusterCockpit log files to identify jobs that were started but not yet archived on a specific day. This is useful for troubleshooting and monitoring job lifecycle.
Purpose
Parses ClusterCockpit log files to:
- Identify jobs that started on a specific day
- Detect jobs that have not been archived
- Generate statistics per user
- Report jobs that may be stuck or still running
Usage
./grepCCLog.pl <logfile> <day>
Arguments
<logfile>
Function: Path to ClusterCockpit log file
Example: /var/log/clustercockpit/cc-backend.log
<day>
Function: Day of month to analyze (numeric)
Example: 15 (for October 15th)
Output
The script produces:
- List of Non-Archived Jobs: Details for each job that started but hasn’t been archived
- Per-User Summary: Count of non-archived jobs per user
- Total Statistics: Overall count of started vs. non-archived jobs
Example Output
======
jobID: 12345 User: alice
======
======
jobID: 12346 User: bob
======
alice => 1
bob => 1
Not stopped: 2 of 10
Log Format Requirements
The script expects log entries in the following format:
Job Start Entry
Oct 15 ... new job (id: 123): cluster=woody, jobId=12345, user=alice, ...
Job Archive Entry
Oct 15 ... archiving job... (dbid: 123): cluster=woody, jobId=12345, user=alice, ...
Limitations
- Hard-coded for cluster name
woody - Hard-coded for month
Oct - Requires specific log message format
- Day must match exactly
Customization
To adapt for your environment, modify the script:
# Line 19: Change cluster name
if ( $cluster eq 'your-cluster-name' && $day eq $Tday ) {
# Line 35: Change cluster name for archive matching
if ( $cluster eq 'your-cluster-name' ) {
# Lines 12 & 28: Update month pattern
if ( /Oct ([0-9]+) .../ ) {
# Change 'Oct' to your desired month
Use Cases
- Debugging: Identify jobs that failed to archive properly
- Monitoring: Track running jobs for a specific day
- Troubleshooting: Find stuck jobs in the system
- Auditing: Verify job lifecycle completion
Example Workflow
# Analyze today's jobs (e.g., October 15)
./grepCCLog.pl /var/log/cc-backend.log 15
# Find jobs started on the 20th
./grepCCLog.pl /var/log/cc-backend.log 20
# Check specific log file
./grepCCLog.pl /path/to/old-logs/cc-backend-2024-10.log 15
Technical Details
The script:
- Opens specified log file
- Parses log entries with regex patterns
- Tracks started jobs in hash table
- Tracks archived jobs in separate hash table
- Compares to find jobs without archive entry
- Aggregates statistics per user
- Outputs results
Jobs are matched by database ID (id: field) between start and archive entries.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.