This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Tools

Command-line tools for ClusterCockpit maintenance and administration

1: archive-manager
2: archive-migration
3: convert-pem-pubkey
4: gen-keypair
5: binaryCheckpointReader
6: grepCCLog.pl
7: Metric Generator Script

This section documents the command-line tools included with ClusterCockpit for various maintenance, migration, and administrative tasks.

Available Tools

Archive Management

archive-manager: Comprehensive job archive management, validation, cleaning, and import/export
archive-migration: Migrate job archives between schema versions

Security & Authentication

gen-keypair: Generate Ed25519 keypairs for JWT signing and validation
convert-pem-pubkey: Convert external Ed25519 PEM keys to ClusterCockpit format

Diagnostics

grepCCLog.pl: Analyze log files to identify non-archived jobs
binaryCheckpointReader: Read and dump .wal or .bin metricstore checkpoint files to human-readable text

Data Generation for cc-metric-store

dataGenerator.sh: Connect to cc-metric-store (external or internal) and push data at 1 minute interval.

Building Tools

All Go-based tools follow the same build pattern:

cd tools/<tool-name>
go build

Common Features

Most tools support:

Configurable logging levels (-loglevel)
Timestamped log output (-logdate)
Configuration file specification (-config)

1 - archive-manager

Job Archive Management Tool

The archive-manager tool provides comprehensive management and maintenance capabilities for ClusterCockpit job archives. It supports validation, cleaning, importing between different archive backends, and general archive operations.

Build

cd tools/archive-manager
go build

Command-Line Options

-s <path>

Function: Specify the source job archive path.

Default: ./var/job-archive

Example: -s /data/job-archive

-config <path>

Function: Specify alternative path to config.json.

Default: ./config.json

Example: -config /etc/clustercockpit/config.json

-validate

Function: Validate a job archive against the JSON schema.

-remove-cluster <cluster>

Function: Remove specified cluster from archive and database.

Example: -remove-cluster oldcluster

-remove-before <date>

Function: Remove all jobs with start time before the specified date.

Format: 2006-Jan-04

Example: -remove-before 2023-Jan-01

-remove-after <date>

Function: Remove all jobs with start time after the specified date.

Format: 2006-Jan-04

Example: -remove-after 2024-Dec-31

-import

Function: Import jobs from source archive to destination archive.

Note: Requires -src-config and -dst-config options.

-convert

Function: Convert an archive between JSON and Parquet formats.

Note: Requires -src-config and -dst-config options. Use -format to specify the output format.

-format <format>

Function: Output format for archive conversion.

Arguments: json | parquet

Default: json

Example: -format parquet

-max-file-size <n>

Function: Maximum Parquet file size in MB before splitting into a new file. Only relevant when -format parquet is used.

Default: 512

Example: -max-file-size 256

-src-config <json>

Function: Source archive backend configuration in JSON format.

Example: -src-config '{"kind":"file","path":"./archive"}'

-dst-config <json>

Function: Destination archive backend configuration in JSON format.

Example: -dst-config '{"kind":"sqlite","dbPath":"./archive.db"}'

-loglevel <level>

Function: Sets the logging level.

Default: info

Example: -loglevel debug

-logdate

Function: Set this flag to add date and time to log messages.

Usage Examples

Validate Archive

./archive-manager -s /data/job-archive -validate

Clean Old Jobs

# Remove jobs older than January 1, 2023
./archive-manager -s /data/job-archive -remove-before 2023-Jan-01

Import Between Archives

# Import from file-based archive to SQLite archive
./archive-manager -import \
  -src-config '{"kind":"file","path":"./old-archive"}' \
  -dst-config '{"kind":"sqlite","dbPath":"./new-archive.db"}'

Convert Archive Format

# Convert JSON file archive to Parquet format
./archive-manager -convert \
  -src-config '{"kind":"file","path":"./job-archive"}' \
  -dst-config '{"kind":"s3","endpoint":"http://minio:9000","bucket":"parquet-archive","access-key":"key","secret-key":"secret"}' \
  -format parquet

# Convert Parquet archive back to JSON file archive
./archive-manager -convert \
  -src-config '{"kind":"s3","endpoint":"http://minio:9000","bucket":"parquet-archive","access-key":"key","secret-key":"secret"}' \
  -dst-config '{"kind":"file","path":"./job-archive-restored"}' \
  -format json

Archive Information

# Display archive statistics
./archive-manager -s /data/job-archive

Features

Validation: Verify job archive integrity against JSON schemas
Cleaning: Remove jobs by date range or cluster
Import/Export: Transfer jobs between different archive backend types
Format Conversion: Convert archives between JSON and Parquet formats
Statistics: Display archive information and job counts
Progress Tracking: Real-time progress reporting for long operations

2 - archive-migration

Job Archive Schema Migration Tool

The archive-migration tool migrates job archives from old schema versions to the current schema version. It handles schema changes such as the exclusive → shared field transformation and adds/removes fields as needed.

Features

Parallel Processing: Uses worker pool for fast migration
Dry-Run Mode: Preview changes without modifying files
Safe Transformations: Applies well-defined schema transformations
Progress Reporting: Shows real-time migration progress
Error Handling: Continues on individual failures, reports at end

Build

cd tools/archive-migration
go build

Command-Line Options

-archive <path>

Function: Path to job archive to migrate (required).

Example: -archive /data/job-archive

-dry-run

Function: Preview changes without modifying files.

-workers <n>

Function: Number of parallel workers.

Default: 4

Example: -workers 8

-loglevel <level>

Function: Sets the logging level.

Default: info

Example: -loglevel debug

-logdate

Function: Add date and time to log messages.

Schema Transformations

Exclusive → Shared

Converts the old exclusive integer field to the new shared string field:

0 → "multi_user"
1 → "none"
2 → "single_user"

Missing Fields

Adds fields required by current schema:

submitTime: Defaults to startTime if missing
energy: Defaults to 0.0
requestedMemory: Defaults to 0
shared: Defaults to "none" if still missing after transformation

Deprecated Fields

Removes fields no longer in schema:

mem_used_max, flops_any_avg, mem_bw_avg
load_avg, net_bw_avg, net_data_vol_total
file_bw_avg, file_data_vol_total

Usage Examples

Preview Changes (Dry Run)

./archive-migration --archive /data/job-archive --dry-run

Migrate Archive

# IMPORTANT: Backup your archive first!
cp -r /data/job-archive /data/job-archive-backup

# Run migration
./archive-migration --archive /data/job-archive

Migrate with Verbose Logging

./archive-migration --archive /data/job-archive --loglevel debug

Migrate with More Workers

./archive-migration --archive /data/job-archive --workers 8

Safety

Always backup your archive before running migration!

The tool modifies meta.json files in place. While transformations are designed to be safe, unexpected issues could occur. Follow these safety practices:

Always run with --dry-run first to preview changes
Backup your archive before migration
Test on a copy of your archive first
Verify results after migration

Verification

After migration, verify the archive:

# Use archive-manager to check the archive
cd ../archive-manager
./archive-manager -s /data/migrated-archive

# Or validate specific jobs
./archive-manager -s /data/migrated-archive --validate

Troubleshooting

Migration Failures

If individual jobs fail to migrate:

Check the error messages for specific files
Examine the failing meta.json files manually
Fix invalid JSON or unexpected field types
Re-run migration (already-migrated jobs will be processed again)

Performance

For large archives:

Increase --workers for more parallelism
Use --loglevel warn to reduce log output
Monitor disk I/O if migration is slow

Technical Details

The migration process:

Walks archive directory recursively
Finds all meta.json files
Distributes jobs to worker pool
For each job:
- Reads JSON file
- Applies transformations in order
- Writes back migrated data (if not dry-run)
Reports statistics and errors

Transformations are idempotent - running migration multiple times is safe (though not recommended for performance).

3 - convert-pem-pubkey

Convert Ed25519 Public Key from PEM to ClusterCockpit Format

The convert-pem-pubkey tool converts an Ed25519 public key from PEM format to the base64 format used by ClusterCockpit for JWT validation.

Use Case

When you have externally generated JSON Web Tokens (JWT) that should be accepted by cc-backend, the external provider shares its public key (used for JWT signing) in PEM format. ClusterCockpit requires this key in a different format, which this tool provides.

Build

cd tools/convert-pem-pubkey
go build

Usage

Input Format (PEM)

-----BEGIN PUBLIC KEY-----
MCowBQYDK2VwAyEA+51iXX8BdLFocrppRxIw52xCOf8xFSH/eNilN5IHVGc=
-----END PUBLIC KEY-----

Convert Key

# Insert your public Ed25519 PEM key into dummy.pub
echo "-----BEGIN PUBLIC KEY-----
MCowBQYDK2VwAyEA+51iXX8BdLFocrppRxIw52xCOf8xFSH/eNilN5IHVGc=
-----END PUBLIC KEY-----" > dummy.pub

# Run conversion
go run . dummy.pub

Output Format

CROSS_LOGIN_JWT_PUBLIC_KEY="+51iXX8BdLFocrppRxIw52xCOf8xFSH/eNilN5IHVGc="

Configuration

Copy the output into ClusterCockpit’s .env file
Restart ClusterCockpit backend
ClusterCockpit can now validate JWTs from the external provider

Command-Line Arguments

convert-pem-pubkey <pem-file>

Arguments: Path to PEM-encoded Ed25519 public key file

Example: go run . dummy.pub

Example Workflow

# 1. Navigate to tool directory
cd tools/convert-pem-pubkey

# 2. Save external provider's PEM key
cat > external-key.pub <<EOF
-----BEGIN PUBLIC KEY-----
MCowBQYDK2VwAyEA+51iXX8BdLFocrppRxIw52xCOf8xFSH/eNilN5IHVGc=
-----END PUBLIC KEY-----
EOF

# 3. Convert to ClusterCockpit format
go run . external-key.pub

# 4. Add output to .env file
# CROSS_LOGIN_JWT_PUBLIC_KEY="+51iXX8BdLFocrppRxIw52xCOf8xFSH/eNilN5IHVGc="

# 5. Restart cc-backend

Technical Details

The tool:

Reads Ed25519 public key in PEM format
Extracts the raw key bytes
Encodes to base64 string
Outputs in ClusterCockpit’s expected format

This enables ClusterCockpit to validate JWTs signed by external providers using their Ed25519 keys.

4 - gen-keypair

Generate Ed25519 Keypair for JWT Signing

The gen-keypair tool generates a new Ed25519 keypair for signing and validating JWT tokens in ClusterCockpit.

Purpose

Generates a cryptographically secure Ed25519 public/private keypair that can be used for:

JWT token signing (private key)
JWT token validation (public key)

Build

cd tools/gen-keypair
go build

Usage

go run .

Or after building:

./gen-keypair

Output

The tool outputs a keypair in base64-encoded format:

ED25519 PUBLIC_KEY="<base64-encoded-public-key>"
ED25519 PRIVATE_KEY="<base64-encoded-private-key>"
This is NO JWT token. You can generate JWT tokens with cc-backend. Use this keypair for signing and validation of JWT tokens in ClusterCockpit.

Configuration

Add the generated keys to ClusterCockpit’s configuration:

Option 1: Environment Variables (.env file)

ED25519_PUBLIC_KEY="<base64-encoded-public-key>"
ED25519_PRIVATE_KEY="<base64-encoded-private-key>"

Option 2: Configuration File (config.json)

{
  "jwts": {
    "publicKey": "<base64-encoded-public-key>",
    "privateKey": "<base64-encoded-private-key>"
  }
}

Example Workflow

# 1. Generate keypair
cd tools/gen-keypair
go run . > keypair.txt

# 2. View generated keys
cat keypair.txt

# 3. Add to .env file (manual or scripted)
grep PUBLIC_KEY keypair.txt >> ../../.env
grep PRIVATE_KEY keypair.txt >> ../../.env

# 4. Restart cc-backend to use new keys

Security Notes

The private key must be kept secret
Store private keys securely (file permissions, encryption at rest)
Use environment variables or secure configuration management
Do not commit private keys to version control
Rotate keys periodically for enhanced security

Technical Details

The tool uses:

Go’s crypto/ed25519 package
/dev/urandom as entropy source on Linux
Base64 standard encoding for output format

Ed25519 provides:

Fast signature generation and verification
Small key and signature sizes
Strong security guarantees

5 - binaryCheckpointReader

Metricstore Checkpoint Inspection Tool

binaryCheckpointReader is part of the cc-backend repository and can be used to debug the content of binary checkpoint files.

The binaryCheckpointReader tool reads .wal or .bin checkpoint files produced by the metricstore WAL/snapshot system and dumps their contents to a human-readable .txt file. It is useful for debugging and inspecting checkpoint data.

Build and Run

The tool is run directly with go run — no separate build step is needed:

go run ./tools/binaryCheckpointReader <file.wal|file.bin>

Usage

go run ./tools/binaryCheckpointReader <file.wal|file.bin>

The tool accepts exactly one argument: the path to a .wal or .bin checkpoint file.

Output is written to a file with the same name as the input but with a .txt extension. For example, current.wal produces current.txt in the same directory.

Supported File Types

.wal — Write-Ahead Log files produced by the binary WAL checkpoint writer. Each record contains a timestamp, metric name, selectors, and a float32 value.
.bin — Binary snapshot files produced by the snapshot checkpoint system. These contain hierarchical metric data organized by scope level (node, socket, etc.).

Output Format

WAL files

=== WAL File Dump ===
File:        /path/to/current.wal
File Magic:  0xCC1DA701 (valid)

--- Record #1 ---
  Timestamp:   1700000000 (2023-11-14T22:13:20Z)
  Metric:      cpu_load
  Selectors:   [node01, cpu0]
  Value:       0.75

=== Total valid records: 42 ===

Binary snapshot files

=== Binary Snapshot Dump ===
File:    /path/to/snapshot.bin
Magic:   0xCC5B0001 (valid)
From:    1700000000 (2023-11-14T22:13:20Z)
To:      1700003600 (2023-11-14T23:13:20Z)

Metrics (2):
  [cpu_load]
    Frequency:  60 s
    Start:      1700000000 (2023-11-14T22:13:20Z)
    Values (60):
      [22:13:20] 0.75 0.8 0.72 ...

Checkpoint File Locations

By default, checkpoint files are stored under ./var/checkpoints/ organized by cluster and host:

var/checkpoints/
└── <cluster>/
    └── <hostname>/
        ├── current.wal   (active WAL log)
        └── <timestamp>.bin  (periodic snapshots)

The checkpoint directory can be configured via the checkpoints.directory option in the metric-store section of config.json.

6 - grepCCLog.pl

Analyze ClusterCockpit Log Files for Running Jobs

The grepCCLog.pl script analyzes ClusterCockpit log files to identify jobs that were started but not yet archived on a specific day. This is useful for troubleshooting and monitoring job lifecycle.

Purpose

Parses ClusterCockpit log files to:

Identify jobs that started on a specific day
Detect jobs that have not been archived
Generate statistics per user
Report jobs that may be stuck or still running

Usage

./grepCCLog.pl <logfile> <day>

Arguments

<logfile>

Function: Path to ClusterCockpit log file

Example: /var/log/clustercockpit/cc-backend.log

<day>

Function: Day of month to analyze (numeric)

Example: 15 (for October 15th)

Output

The script produces:

List of Non-Archived Jobs: Details for each job that started but hasn’t been archived
Per-User Summary: Count of non-archived jobs per user
Total Statistics: Overall count of started vs. non-archived jobs

Example Output

======
jobID:  12345 User:  alice
======
======
jobID:  12346 User:  bob
======
alice => 1
bob => 1
Not stopped: 2 of 10

Log Format Requirements

The script expects log entries in the following format:

Job Start Entry

Oct 15 ... new job (id: 123): cluster=woody, jobId=12345, user=alice, ...

Job Archive Entry

Oct 15 ... archiving job... (dbid: 123): cluster=woody, jobId=12345, user=alice, ...

Limitations

Hard-coded for cluster name woody
Hard-coded for month Oct
Requires specific log message format
Day must match exactly

Customization

To adapt for your environment, modify the script:

# Line 19: Change cluster name
if ( $cluster eq 'your-cluster-name' && $day eq $Tday  ) {

# Line 35: Change cluster name for archive matching
if ( $cluster eq 'your-cluster-name' ) {

# Lines 12 & 28: Update month pattern
if ( /Oct ([0-9]+) .../ ) {
# Change 'Oct' to your desired month

Use Cases

Debugging: Identify jobs that failed to archive properly
Monitoring: Track running jobs for a specific day
Troubleshooting: Find stuck jobs in the system
Auditing: Verify job lifecycle completion

Example Workflow

# Analyze today's jobs (e.g., October 15)
./grepCCLog.pl /var/log/cc-backend.log 15

# Find jobs started on the 20th
./grepCCLog.pl /var/log/cc-backend.log 20

# Check specific log file
./grepCCLog.pl /path/to/old-logs/cc-backend-2024-10.log 15

Technical Details

The script:

Opens specified log file
Parses log entries with regex patterns
Tracks started jobs in hash table
Tracks archived jobs in separate hash table
Compares to find jobs without archive entry
Aggregates statistics per user
Outputs results

Jobs are matched by database ID (id: field) between start and archive entries.

7 - Metric Generator Script

Overview

The Metric Generator is a bash script designed to simulate high-frequency metric data for the alex and fritz clusters. It is primarily used for testing the connection to cc-metric-store and put dummy data into it. This can either be your separately hoster cc-metric-store (which is what we call external mode) or your integrated cc-metric-store into cc-backend (which is what we call internal cc-metric-store).

The script supports two transport mechanisms:

REST API (via curl)
NATS Messaging (via nats-cli)

It also supports two deployment scopes to handle different URL structures and authentication methods:

Internal (Integrated cc-metric-store into cc-backend)
External (Self-hosted separate cc-metric-store)

Configuration

The script behavior is controlled by variables defined at the top of the file.

Main Operation Flags

Variable	Options	Description
`TRANSPORT_MODE`	`"REST"` / `"NATS"`	REST: Sends HTTP POST requests. NATS: Publishes to a NATS subject.
`CONNECTION_SCOPE`	`"INTERNAL"` / `"EXTERNAL"`	INTERNAL: To use integrated cc-metric-store. EXTERNAL: To use self-hosted separate cc-metric-store.
`API_USER`	String (e.g., `"demo"`)	The username used to generate the JWT when in INTERNAL mode.

Network Settings

Variable	Description	Required Mode
`SERVICE_ADDRESS`	Base URL of the API (e.g., `http://localhost:8080`).	REST
`NATS_SERVER`	NATS connection string (e.g., `nats://0.0.0.0:4222`).	NATS
`NATS_SUBJECT`	The subject topic to publish messages to (e.g., `hpc-nats`).	NATS
`JWT_STATIC`	A hardcoded Bearer token used for authentication.	EXTERNAL

Logic & Behavior

Connection Scopes (REST Mode)

The script automatically adjusts the target URL and Authentication method based on the CONNECTION_SCOPE.

Feature	Scope: `INTERNAL`	Scope: `EXTERNAL`
Target URL	`{SERVICE_ADDRESS}/metricstore/api/write`	`{SERVICE_ADDRESS}/api/write`
Authentication	Dynamic: Executes `./cc-backend -jwt "$API_USER"`	Static: Uses `JWT_STATIC` variable

Transport Modes

REST: The script writes a batch of metrics to a temporary file and uses curl to POST the file binary to the configured URL.
NATS: The script writes a batch of metrics to a temporary file and pipes (|) the content directly to the nats pub command.

Data Specifications

The script generates InfluxDB/Line Protocol formatted text. It iterates through varying hardware hierarchies for two clusters: Alex and Fritz.

1. Metric Dimensions (Tags)

Every data point includes the following tags:

cluster: alex or fritz
hostname: A random host from the predefined host lists.
type: The hardware level (see below).
type-id: The specific index or ID of the hardware component.

2. Hierarchy Levels

Hierarchy Type	ID Format	Count	Notes
`hwthread`	Integer	0..127 (Alex) / 0..71 (Fritz)	Highest volume metric
`accelerator`	PCI Address	8 per node	Alex Only
`memoryDomain`	Integer	0..7	Alex Only
`socket`	Integer	0..1	All Clusters
`node`	N/A	1 per host	All Clusters

3. Metric Fields

Standard Metrics (hwthread, socket, accelerator, memoryDomain):

cpu_load, cpu_user, flops_any, cpu_irq, cpu_system, ipc, cpu_idle, cpu_iowait, core_power, clock

Node Metrics (node):

cpu_irq, cpu_load, mem_cached, net_bytes_in, cpu_user, cpu_idle, nfs4_read, mem_used, nfs4_write, nfs4_total, ib_xmit, ib_xmit_pkts, net_bytes_out, cpu_iowait, ib_recv, cpu_system, ib_recv_pkts

Usage Examples

1. Run for Internal CCMS

Set the variables inside the script:

TRANSPORT_MODE="REST"
CONNECTION_SCOPE="INTERNAL"

Effect: Generates a new token using cc-backend and posts to /metricstore/api/write.

2. Run for External CCMS

Set the variables inside the script:

TRANSPORT_MODE="REST"
CONNECTION_SCOPE="EXTERNAL"

Effect: Uses the static JWT and posts to /api/write.

3. Run as NATS Publisher

Set the variables inside the script:

TRANSPORT_MODE="NATS"

Effect: Pipes data directly to the NATS server on hpc-nats.