cc-backend version 1.5.3
Supports job archive version 3 and database version 11.
What you need to do when upgrading
Upgrading from v1.5.2
No database migration is required. Start the new binary directly.
Upgrading from v1.5.1
No database migration is required. Start the new binary directly.
Upgrading from v1.5.0
A database migration to version 11 is required:
./cc-backend -migrate-db
For optimal database performance after the migration it is recommended to run:
./cc-backend -optimize-db
Depending on your database size (more than 40 GB) the VACUUM step may take up
to 2 hours. You can also run ANALYZE manually instead.
Upgrading from before v1.5.0
- Update
config.json: All configuration attributes were renamed tokebab-casein v1.5.0 (e.g.,apiAllowedIPs→api-allowed-ips). See the configuration reference for the full list. - Remove the
clusterssection fromconfig.json. Cluster information is now derived from the job archive. - Migrate your job database to version 11 (see Database migration).
- Migrate your job archive to version 3 (see Job Archive migration).
Changes in 1.5.3
Bug fixes
- WAL not rotated on partial checkpoint failure: When binary checkpointing failed for some hosts, WAL files for successfully checkpointed hosts were not rotated and the checkpoint timestamp was not advanced. Partial successes now correctly advance the checkpoint and rotate WAL files for completed hosts.
- Unbounded WAL file growth: If binary checkpointing consistently failed for
a host, its
current.walfile grew without limit until disk exhaustion. A newmax-wal-sizeconfiguration option (in thecheckpointsblock) allows setting a per-host WAL size cap in bytes. When exceeded, the WAL is force-rotated. Defaults to0(unlimited) for backward compatibility. - Range filter fixes: Range filters now correctly handle zero as a boundary value. Improved validation and UI text for range selections.
- Line protocol body parsing interrupted: Switched from
ReadTimeouttoReadHeaderTimeoutso that long-running metric submissions are no longer cut off mid-stream. - Checkpoint archiving continues on error: A single cluster’s archiving failure no longer aborts the entire cleanup operation.
- Removed metrics excluded from subcluster config: Metrics removed from a
subcluster are no longer returned by
GetMetricConfigSubCluster.
New features
-cleanup-checkpointsCLI flag: Triggers checkpoint cleanup without starting the server, useful for maintenance windows or automated cleanup scripts.max-wal-sizeconfig option: New option in themetric-store.checkpointsblock to cap per-host WAL file size in bytes and prevent unbounded disk growth.- Explicit node state queries in node view: Node health and scheduler state are now fetched independently from metric data for fresher status information.
binaryCheckpointReadertool: New utility (tools/binaryCheckpointReader) that reads.walor.bincheckpoint files and dumps their contents to a human-readable.txtfile. See binaryCheckpointReader.
MetricStore performance
- WAL writer throughput: Decoupled WAL file flushing from message processing using a periodic 5-second batch flush, significantly increasing metric ingestion throughput.
- Improved shutdown time: MetricStore and archiver now shut down concurrently.
Changes in 1.5.2
Bug fixes
- Fixed memory spikes when using the metricstore move (archive) policy with the Parquet writer.
- Fixed top list queries in analysis and dashboard views.
- Down nodes are now excluded from health checks.
- Fixed a blocking NATS receive call in the metricstore.
Performance
- Bulk insert operations now use explicit transactions, reducing write contention on the SQLite database.
- The WAL consumer is now sharded for significantly higher write throughput.
- New
busy-timeout-msconfiguration option for the SQLite connection. - Optimized stats queries.
NATS API
- The NATS node state handler now performs the same metric health checks as the REST API handler.
Changes in 1.5.1
Database
- New migration (version 11): Optimized database index count and added covering indexes for stats queries for significantly improved query performance.
-optimize-dbflag: New CLI flag to runVACUUMandANALYZEon demand. AutomaticANALYZEon startup has been removed.- SQLite configuration hardening: New configurable options via
db-configsection (see configuration reference).
Bug fixes
- Fixed crash when
enable-job-taggersis set but tagger rule directories are missing. - Fixed GT and LT conditions in ranged filters.
- Fixed wrong metricstore schema (missing comma).
Breaking changes in v1.5.0
Configuration
- JSON attribute naming: All JSON configuration attributes now use
kebab-casestyle consistently (e.g.,api-allowed-ipsinstead ofapiAllowedIPs). Update yourconfig.jsonaccordingly. - Removed
disable-archiveoption: This obsolete configuration option has been removed. - Removed
clustersconfig section: Cluster information is now derived from the job archive.
Architecture
- MySQL/MariaDB support removed: Only SQLite is now supported as the database backend.
- Web framework replaced: Migrated from
gorilla/muxtochias the HTTP router. - MetricStore moved: The
metricstorepackage has been moved frominternal/topkg/. minRunningForfilter removed: Replaced by a Short jobs quick-filter button in the job list.
Major features in v1.5.0
- NATS API Integration: Subscribe to real-time job start/stop events and node
state changes via NATS. NATS subjects are configurable via
api-subjects. - Public Dashboard: New public-facing dashboard at
/public. - Enhanced Node Management: Node state tracking table with configurable retention and archiving to Parquet format.
- Health Monitoring: New Health tab in the status details view showing per-node metric health. Supports querying external cc-metric-store health status via the API.
- Web-based Log Viewer: Inspect backend log output from the admin interface without shell access.
- Job Tagging System: Automatic detection of applications (MATLAB, GROMACS, etc.) and pathological jobs. Taggers can be triggered on-demand from the admin interface.
- Parquet Archive Format: Columnar storage with zstd compression. Full S3 and SQLite blob backends also supported.
- Unified Archive Retention: Supports both JSON and Parquet as target formats.
New configuration options in v1.5.0
GitHub Repository with complete configuration examples. All configuration options are checked against a JSON schema.
{
"main": {
"enable-job-taggers": true,
"resampling": {
"minimum-points": 600,
"trigger": 180,
"resolutions": [240, 60]
},
"api-subjects": {
"subject-job-event": "cc.job.event",
"subject-node-state": "cc.node.state"
},
"nodestate-retention": {
"policy": "move",
"age": 720,
"target-path": "./var/nodestate-archive"
}
},
"nats": {
"address": "nats://0.0.0.0:4222",
"username": "root",
"password": "root"
},
"cron": {
"commit-job-worker": "1m",
"duration-worker": "5m",
"footprint-worker": "10m"
},
"metric-store": {
"cleanup": {
"mode": "archive",
"directory": "./var/archive"
}
},
"archive": {
"retention": {
"policy": "delete",
"age": 180,
"format": "parquet"
}
}
}
Transfer cc-metric-store checkpoints
When migrating from a standalone external cc-metric-store to the metric store
integrated in cc-backend, you can transfer existing checkpoints using the
following script:
#!/bin/bash
# Set these to the checkpoint directories from your CCMS and CCB config.json
CCMS_CHECKPOINTS_DIR="/home/dummy/cc-metric-store/var/checkpoints"
CCB_CHECKPOINTS_DIR="/home/dummy/cc-backend/var/checkpoints"
if [ -d "$CCMS_CHECKPOINTS_DIR" ]; then
if [ ! -d "$CCB_CHECKPOINTS_DIR" ]; then
mkdir "$CCB_CHECKPOINTS_DIR"
fi
mv -f "$CCMS_CHECKPOINTS_DIR" "$CCB_CHECKPOINTS_DIR"
echo "Success: checkpoints moved to $CCB_CHECKPOINTS_DIR"
else
echo "Error: Directory '$CCMS_CHECKPOINTS_DIR' does not exist."
fi
Known issues
- The dynamic memory management is not yet fully reliable across restarts. Buffers kept outside the retention period may be lost on restart.
- The web-based log viewer requires
cc-backendto be started via systemd. The user runningcc-backendmust have permission to executejournalctl. - Old user UI configuration persisted in the database is not used after an
upgrade. It is recommended to configure the metrics shown via
ui-configand clear the old records from the database after updating. - Energy footprint metrics of type
energyare currently ignored when calculating total energy. - For energy footprint metrics of type
power, the unit is ignored and the metric is assumed to be in Watts.