archive-manager

Job Archive Management Tool

The archive-manager tool provides comprehensive management and maintenance capabilities for ClusterCockpit job archives. It supports validation, cleaning, importing between different archive backends, and general archive operations.

Build

cd tools/archive-manager
go build

Command-Line Options


-s <path>

Function: Specify the source job archive path.

Default: ./var/job-archive

Example: -s /data/job-archive


-config <path>

Function: Specify alternative path to config.json.

Default: ./config.json

Example: -config /etc/clustercockpit/config.json


-validate

Function: Validate a job archive against the JSON schema.


-remove-cluster <cluster>

Function: Remove specified cluster from archive and database.

Example: -remove-cluster oldcluster


-remove-before <date>

Function: Remove all jobs with start time before the specified date.

Format: 2006-Jan-04

Example: -remove-before 2023-Jan-01


-remove-after <date>

Function: Remove all jobs with start time after the specified date.

Format: 2006-Jan-04

Example: -remove-after 2024-Dec-31


-import

Function: Import jobs from source archive to destination archive.

Note: Requires -src-config and -dst-config options.


-convert

Function: Convert an archive between JSON and Parquet formats.

Note: Requires -src-config and -dst-config options. Use -format to specify the output format.


-format <format>

Function: Output format for archive conversion.

Arguments: json | parquet

Default: json

Example: -format parquet


-max-file-size <n>

Function: Maximum Parquet file size in MB before splitting into a new file. Only relevant when -format parquet is used.

Default: 512

Example: -max-file-size 256


-src-config <json>

Function: Source archive backend configuration in JSON format.

Example: -src-config '{"kind":"file","path":"./archive"}'


-dst-config <json>

Function: Destination archive backend configuration in JSON format.

Example: -dst-config '{"kind":"sqlite","dbPath":"./archive.db"}'


-loglevel <level>

Function: Sets the logging level.

Arguments: debug | info | warn | err | fatal | crit

Default: info

Example: -loglevel debug


-logdate

Function: Set this flag to add date and time to log messages.

Usage Examples

Validate Archive

./archive-manager -s /data/job-archive -validate

Clean Old Jobs

# Remove jobs older than January 1, 2023
./archive-manager -s /data/job-archive -remove-before 2023-Jan-01

Import Between Archives

# Import from file-based archive to SQLite archive
./archive-manager -import \
  -src-config '{"kind":"file","path":"./old-archive"}' \
  -dst-config '{"kind":"sqlite","dbPath":"./new-archive.db"}'

Convert Archive Format

# Convert JSON file archive to Parquet format
./archive-manager -convert \
  -src-config '{"kind":"file","path":"./job-archive"}' \
  -dst-config '{"kind":"s3","endpoint":"http://minio:9000","bucket":"parquet-archive","access-key":"key","secret-key":"secret"}' \
  -format parquet

# Convert Parquet archive back to JSON file archive
./archive-manager -convert \
  -src-config '{"kind":"s3","endpoint":"http://minio:9000","bucket":"parquet-archive","access-key":"key","secret-key":"secret"}' \
  -dst-config '{"kind":"file","path":"./job-archive-restored"}' \
  -format json

Archive Information

# Display archive statistics
./archive-manager -s /data/job-archive

Features

  • Validation: Verify job archive integrity against JSON schemas
  • Cleaning: Remove jobs by date range or cluster
  • Import/Export: Transfer jobs between different archive backend types
  • Format Conversion: Convert archives between JSON and Parquet formats
  • Statistics: Display archive information and job counts
  • Progress Tracking: Real-time progress reporting for long operations