Function: Setup var directory, initialize sqlite database file, config.json and .env.
-init-db
Function: Go through job-archive and re-initialize the job, tag, and
jobtag tables (all running jobs will be lost!).
Caution: All running jobs will be lost!
-jwt <username>
Function: Generate and print a JWT for the user specified by its username.
Example:-jwt abcduser
-logdate
Function: Set this flag to add date and time to log messages.
-loglevel <level>
Function: Sets the logging level.
Arguments:debug | info | warn | err | crit
Default:warn
Example:-loglevel debug
-migrate-db
Function: Migrate database to supported version and exit.
-revert-db
Function: Migrate database to previous version and exit.
-server
Function: Start a server, continues listening on port after initialization and
argument handling.
-sync-ldap
Function: Sync the hpc_user table with ldap.
-version
Function: Show version information and exit.
2 - Configuration
ClusterCockpit Configuration Option References
cc-backend requires a JSON configuration file. The configuration files is
structured into components. Every component is configured either in a separate
JSON object or using a separate file. When a section is put in a separate file
the section key has to have a -file suffix.
Example:
"auth-file":"./var/auth.json"
To override the default config file path, specify the location of a JSON
configuration file with the -config <file path> command line option.
Configuration Options
Section main
Section must exist.
addr: Type string (Optional). Address where the http (or https) server will
listen on (for example: ‘0.0.0.0:80’). Default localhost:8080.
api-allowed-ips: Type array of strings (Optional). IPv4 addresses from
which the secured administrator API endpoint functions /api/* can be reached.
Default: No restriction. The previous * wildcard is still supported but
obsolete.
user: Type string (Optional). Drop root permissions once .env was read and
the port was taken. Only applicable if using privileged port.
group: Type string. Drop root permissions once .env was read and the port
was taken. Only applicable if using privileged port.
embed-static-files: Type bool (Optional). If all files in
web/frontend/public should be served from within the binary itself (they are
embedded) or not. Default true.
static-files: Type string (Optional). Folder where static assets can be
found, if embed-static-files is false. No default.
db: Type string (Optional). The db file path. Default: ./var/job.db.
enable-job-taggers: Type bool (Optional). Enable automatic job taggers for
application and job class detection. Requires to provide tagger rules. Default:
false.
validate: Type bool (Optional). Validate all input JSON documents against
JSON schema. Default: false.
session-max-age: Type string (Optional). Specifies for how long a session
shall be valid as a string parsable by time.ParseDuration(). If 0 or empty, the
session/token does not expire! Default 168h.
https-cert-file and https-key-file: Type string (Optional). If both those
options are not empty, use HTTPS using those certificates. Default: No HTTPS.
redirect-http-to: Type string (Optional). If not the empty string and addr
does not end in “:80”, redirect every request incoming at port 80 to that url.
stop-jobs-exceeding-walltime: Type int (Optional). If not zero,
automatically mark jobs as stopped running X seconds longer than their walltime.
Only applies if walltime is set for job. Default 0.
short-running-jobs-duration: Type int (Optional). Do not show running jobs
shorter than X seconds. Default 300.
emission-constant: Type integer (Optional). Energy Mix CO2 Emission Constant
[g/kWh]. If entered, UI displays estimated CO2 emission for job based on jobs’
total Energy.
resampling: Type object (Optional). If configured, will enable
dynamic downsampling of metric data using the configured values.
minimum-points: Type integer. This option allows user to specify the minimum
points required for resampling; Example: 600. If minimum-points: 600, assuming
frequency of 60 seconds per sample, then a resampling would trigger only for
jobs > 10 hours (600 / 60 = 10).
resolutions: Type array [integer]. Array of resampling target resolutions,
in seconds; Example: [600,300,60].
trigger: Type integer. Trigger next zoom level at less than this many
visible datapoints.
machine-state-dir: Type string (Optional). Where to store MachineState
files. TODO: Explain in more detail!
api-subjects: Type object (Optional). NATS subjects configuration for
subscribing to job and node events. Default: No NATS API.
subject-job-event: Type string. NATS subject for job events (start_job, stop_job).
subject-node-state: Type string. NATS subject for node state updates.
Section nats
Section is optional.
address: Type string. Address of the NATS server (e.g., nats://localhost:4222).
username: Type string (Optional). Username for NATS authentication.
password: Type string (Optional). Password for NATS authentication (optional).
creds-file-path: Type string (Optional). Path to NATS credentials file for
authentication (optional).
Section cron
Section must exist.
commit-job-worker: Type string. Frequency of commit job worker. Default: 2m
duration-worker: Type string. Frequency of duration worker. Default: 5m
footprint-worker: Type string. Frequency of footprint. Default: 10m
Section archive
Section is optional. If section is not provided, the default is kind set to
file with path set to ./var/job-archive.
kind: Type string (required). Set archive backend. Supported values: file,
s3, sqlite.
path: Type string (Optional). Path to the job-archive. Default: ./var/job-archive.
compression: Type integer (Optional). Setup automatic compression for jobs
older than number of days.
retention: Type object (Optional). Enable retention policy for archive and
database.
policy: Type string (required). Retention policy. Possible values none,
delete, move.
include-db: Type bool (Optional). Also remove jobs from database. Default:
true.
age: Type integer (Optional). Act on jobs with startTime older than age
(in days).
Default: 7 days.
location: Type string (Optional). The target directory for retention. Only
applicable
for retention policy move. Only applies for move policy.
Section auth
Section must exist.
jwts: Type object (required). For JWT Authentication.
max-age: Type string (required). Configure how long a token is valid. As
string parsable by time.ParseDuration().
cookie-name: Type string (Optional). Cookie that should be checked for a
JWT token.
vaidate-user: Type bool (Optional). Deny login for users not in database
(but defined in JWT). Overwrite roles in JWT with database roles.
trusted-issuer: Type string (Optional). Issuer that should be accepted when
validating external JWTs.
sync-user-on-login: Type bool (Optional). Add non-existent user to DB at
login attempt with values provided in JWT.
update-user-on-login: Type bool (Optional). Update existent user in DB at
login attempt with values provided in JWT. Currently only the person name is
updated.
ldap: Type object (Optional). For LDAP Authentication and user
synchronisation. Default nil.
url: Type string (required). URL of LDAP directory server.
user-base: Type string (required). Base DN of user tree root.
search-dn: Type string (required). DN for authenticating LDAP admin
account with general read rights.
user-bind: Type string (required). Expression used to authenticate users
via LDAP bind. Must contain uid={username}.
user-filter: Type string (required). Filter to extract users for syncing.
username-attr: Type string (Optional). Attribute with full user name.
Defaults to gecos if not provided.
sync-interval: Type string (Optional). Interval used for syncing local
user table with LDAP directory. Parsed using time.ParseDuration.
sync-del-old-users: Type bool (Optional). Delete obsolete users in database.
sync-user-on-login: Type bool (Optional). Add non-existent user to DB at
login attempt if user exists in LDAP directory.
oidc: Type object (Optional). For OpenID Connect Authentication. Default nil.
provider: Type string (required). OpenID Connect provider URL.
sync-user-on-login: Type bool. Add non-existent user to DB at login attempt
with values provided.
update-user-on-login: Type bool. Update existent user in DB at login attempt
with values provided. Currently only the person name is updated.
Section metric-store
Section must exist.
retention-in-memory: Type string (required). Keep the metrics within memory
for given time interval. Retention for X hours, then the metrics would be freed.
Buffers that are still used by running jobs will be kept.
memory-cap: Type integer (required). If memory used exceeds value in GB,
buffers still used by long running jobs will be freed.
num-workers: Type integer (Optional). Number of concurrent workers for
checkpoint and archive operations. Default: If not set defaults to
min(runtime.NumCPU()/2+1, 10)
checkpoints: Type object (required). Configuration for checkpointing the
metrics buffers
file-format: Type string (Optional). Format to use for checkpoint files.
Can be JSON or Avro. Default: Avro.
directory: Type string (Optional). Path in which the checkpoints should be
placed. Default: ./var/checkpoints.
cleanup: Type object (Optional). Configuration for the cleanup process. If
not set the mode is delete with interval set to the retention-in-memory
interval.
mode: Type string (Optional). The mode for cleanup. Can be delete or
archive. Default: delete.
interval: Type string (Optional). Interval at which the cleanup runs.
directory: Type string (required if mode is archive). Directory where to
put the archive
files.
nats-subscriptions: Type array (Optional). List of NATS subjects the metric
store should subscribe to. Items are of type object with the following
attributes:
subscribe-to: Type string (required). NATS subject to subscribe to.
cluster-tag: Type string (Optional). Allow lines without a cluster tag,
use this as default.
Section ui
The ui section specifies defaults for the web user interface. The defaults
which metrics to show in different views can be overwritten per cluster or
subcluster.
job-list: Type object (Optional). Job list defaults. Applies to user and
jobs views.
use-paging: Type bool (Optional). If classic paging is used instead of
continuous scrolling by default.
show-footprint: Type bool (Optional). If footprint bars are shown as first
column by default.
node-list: Type object (Optional). Node list defaults. Applies to node list
view.
use-paging: Type bool (Optional). If classic paging is used instead of
continuous scrolling by default.
job-view: Type object (Optional). Job view defaults.
show-polar-plot: Type bool (Optional). If the job metric footprints polar
plot is shown by default.
show-footprint: Type bool (Optional). If the annotated job metric
footprint bars are shown by default.
show-roofline: Type bool (Optional). If the job roofline plot is shown by
default.
show-stat-table: Type bool (Optional). If the job metric statistics table
is shown by default.
metric-config: Type object (Optional). Global initial metric selections for
primary views of all clusters.
job-list-metrics: Type array [string] (Optional). Initial metrics shown
for new users in job lists (User and jobs view).
job-view-plot-metrics: Type array [string] (Optional). Initial metrics
shown for new users as job view metric plots.
job-view-table-metrics: Type array [string] (Optional). Initial metrics
shown for new users in job view statistics table.
clusters: Type array of objects (Optional). Overrides for global defaults
by cluster and subcluster.
name: Type string (required). The name of the cluster.
job-list-metrics: Type array [string] (Optional). Initial metrics shown
for new users in job lists (User and jobs view) for this cluster.
job-view-plot-metrics: Type array [string] (Optional). Initial metrics
shown for new users as job view timeplots for this cluster.
job-view-table-metrics: Type array [string] (Optional). Initial metrics
shown for new users in job view statistics table for this cluster.
sub-clusters: Type array of objects (Optional). The array of overrides
per subcluster.
name: Type string (required). The name of the subcluster.
job-list-metrics: Type array [string] (Optional). Initial metrics
shown for new users in job lists (User and jobs view) for subcluster.
job-view-plot-metrics: Type array [string] (Optional). Initial metrics
shown for new users as job view timeplots for subcluster.
job-view-table-metrics: Type array [string] (Optional). Initial
metrics shown for new users in job view statistics table for subcluster.
plot-configuration: Type object (Optional). Initial settings for plot render
options.
color-background: Type bool (Optional). If the metric plot backgrounds are
initially colored by threshold limits.
plots-per-row: Type integer (Optional). How many plots are initially
rendered per row. Applies to job, single node, and analysis views.
line-width: Type integer (Optional). Initial thickness of rendered
plotlines. Applies to metric plot, job compare plot and roofline.
color-scheme: Type array [string] (Optional). Initial colorScheme to be
used for metric plots.
3 - Environment
ClusterCockpit Environment Variables
All security-related configurations, e.g. keys and passwords, are set using
environment variables. It is supported to set these by means of a .env file in
the project root.
Environment Variables
JWT_PUBLIC_KEY and JWT_PRIVATE_KEY: Base64 encoded Ed25519 keys used for
JSON Web Token (JWT) authentication. You can generate your own keypair using go run ./tools/gen-keypair/. The release binaries also include the
gen-keypair tool for x86-64. For more information, see the
JWT documentation.
SESSION_KEY: Some random bytes used as secret for cookie-based sessions
LDAP_ADMIN_PASSWORD: The LDAP admin user password (optional)
CROSS_LOGIN_JWT_HS512_KEY: Used for token based logins via another
authentication service (optional)
OID_CLIENT_ID: OpenID connect client id (optional)
Below is an example .env file.
Copy it as .env into the project root and adapt it for your needs.
# Base64 encoded Ed25519 keys (DO NOT USE THESE TWO IN PRODUCTION!)
# You can generate your own keypair using `go run tools/gen-keypair/main.go`
JWT_PUBLIC_KEY="kzfYrYy+TzpanWZHJ5qSdMj5uKUWgq74BWhQG6copP0="
JWT_PRIVATE_KEY="dtPC/6dWJFKZK7KZ78CvWuynylOmjBFyMsUWArwmodOTN9itjL5POlqdZkcnmpJ0yPm4pRaCrvgFaFAbpyik/Q=="
# Base64 encoded Ed25519 public key for accepting externally generated JWTs
# Keys in PEM format can be converted, see `tools/convert-pem-pubkey/Readme.md`
CROSS_LOGIN_JWT_PUBLIC_KEY=""
# Some random bytes used as secret for cookie-based sessions (DO NOT USE THIS ONE IN PRODUCTION)
SESSION_KEY="67d829bf61dc5f87a73fd814e2c9f629"
# Password for the ldap server (optional)
LDAP_ADMIN_PASSWORD="mashup"
4 - REST API
ClusterCockpit RESTful API Endpoint Reference
REST API Authorization
In ClusterCockpit JWTs are signed using a public/private key pair using ED25519.
Because tokens are signed using public/private key pairs, the signature also
certifies that only the party holding the private key is the one that signed it.
JWT tokens in ClusterCockpit are not encrypted, means all information is clear
text. Expiration of the generated tokens can be configured in config.json using
the max-age option in the jwts object. Example:
"jwts":{"max-age":"168h"},
The party that generates and signs JWT tokens has to be in possession of the
private key and any party that accepts JWT tokens must possess the public key to
validate it. cc-backed therefore requires both keys, the private one to
sign generated tokens and the public key to validate tokens that are provided by
REST API clients.
Generate ED25519 key pairs
We provide a tool as part of cc-backend to generate a ED25519 keypair.
The tool is called gen-keypair and provided as part of the release binaries.
You can easily build it yourself in the cc-backend source tree with:
go build tools/gen-keypair
To use it just call it without any arguments:
./gen-keypair
Usage of Swagger UI documentation
Swagger UI is a REST API documentation
and testing framework. To use the Swagger
UI for testing you have to run an
instance of cc-backend on localhost (and use the default port 8080):
./cc-backend -server
You may want to start the demo as described here .
This Swagger UI is also available as part of cc-backend if you start it with
the dev option:
This reference is rendered using the swaggerui plugin based on the original definition file found in the ClusterCockpit repository, but without a serving backend.This means that all interactivity (“Try It Out”) will not return actual data. However, a Curl call and a compiled Request URL will still be displayed, if an API endpoint is executed.
Administrator API
Endpoints displayed here correspond to the administrator /api/ endpoints, but user-accessible /userapi/ endpoints are functionally identical. See these lists for information about accessibility.
5 - Authentication Handbook
How to configure and use the authentication backends
Introduction
cc-backend supports the following authentication methods:
Local login with credentials stored in SQL database
Login with authentication to a LDAP directory
Authentication via JSON Web Token (JWT):
With token provided in HTML request header
With token provided in cookie
Login via OpenID Connect (against a KeyCloak instance)
All above methods create a session cookie that is then used for subsequent
authentication of requests. Multiple authentication methods can be configured at
the same time. If LDAP is enabled it takes precedence over local
authentication. The OpenID Connect method against a
KeyCloak instance enables many more authentication
methods using the ability of KeyCloak to act as an Identity Broker.
The REST API uses stateless authentication via a JWT token, which means that
every requests must be authenticated.
General configuration options
All configuration is part of the cc-backend configuration file config.json.
All security sensitive options as passwords and tokens are passed in terms of
environment variables. cc-backend supports to read an .env file upon startup
and set the environment variables contained there.
Duration of session
Per default the maximum duration of a session is 7 days. To change this the
option session-max-age has to be set to a string that can be parsed by the
Golang time.ParseDuration() function.
For most use cases the largest unit h is the only relevant option.
Example:
"session-max-age":"24h",
To enable unlimited session duration set session-max-age either to 0 or empty
string.
LDAP authentication
Configuration
To enable LDAP authentication the following set of options are required as
attributes of the ldap JSON object:
url: URL of the LDAP directory server. This must be a complete URL including
the protocol and not only the host name. Example: ldaps://ldsrv.mydomain.com.
user_base: Base DN of user tree root. Example: ou=people,ou=users,dc=rz,dc=mydomain,dc=com.
search_dn: DN for authenticating an LDAP admin account with general read
rights. This is required for the sync on login and the sync options. Example:
cn=monitoring,ou=adm,ou=profile,ou=manager,dc=rz,dc=mydomain,dc=com
user_bind: Expression used to authenticate users via LDAP bind. Must contain
uid={username}. Example:
uid={username},ou=people,ou=users,dc=rz,dc=mydomain,dc=com.
user_filter: Filter to extract users for syncing. Example: (&(objectclass=posixAccount)).
Optional configuration options are:
username_attr: Attribute with full user name. Defaults to gecos if not provided.
sync_interval: Interval used for syncing SQL user table with LDAP
directory. Parsed using time.ParseDuration. The sync interval is always relative
to the time cc-backend was started. Example: 24h.
sync_del_old_users: Type boolean. Delete users in SQL database if not in
LDAP directory anymore. This of course only applies to users that were added
from LDAP.
syncUserOnLogin: Type boolean. Add non-existent user to DB at login attempt
if user exists in LDAP directory. This option enables that users can login at
once after they are added to the LDAP directory.
The LDAP authentication method requires the environment variable
LDAP_ADMIN_PASSWORD for the search_dn account that is used to sync users.
Usage
If LDAP is configured it is the first authentication method that is tried if a
user logs in using the login form. A sync with the LDAP directory can also be
triggered from the command line using the flag -sync-ldap.
OpenID Connect authentication
Configuration
To enable OpenID Connect authentication the following set of options are
required below a top-level oicd key:
provider: The base URL of your OpenID Connect provider. Example:
https://auth.example.com/realms/mycloud.
Furthermore the following environment variables have to be set (in the .env
file):
OID_CLIENT_ID: Set this to the Client ID you configured in Keycloak.
OID_CLIENT_SECRET: Set this to the Client ID secret available in you
Keycloak Open ID Client configuration.
Required settings in KeyCloak
The OpenID Connect implementation was only tested against the KeyCloak
provider.
Steps to setup KeyCloak:
Create a new realm. This will determine the provider URL.
Create a new OpenID Connect client
Set a Client ID, the Client ID secret is automatically generated and
available at the Credentials tab.
For Access settings set:
Root URL: This is the base URL of your cc-backend instance.
Valid redirect URLs: Set this to oidc-callback. Wildcards did not work
for me.
Web origins: Set this also to the base URL of your cc-backend instance.
Keycloak client Access settings
Enable PKCE:
Click on Advanced tab. Further click on Advanced settings on the right side.
Set the option Proof Key for Code Exchange Code Challenge Method to
S256.
Keycloak advanced client settings for PKCE
Everything else can be left to the default. Do not forget to create users in
your realm before testing.
Usage
If the oicd config key is correctly set and the required environment variables
are available, an additional button for OpenID Connect Login is shown below the
login mask. If pressed this button will redirect to the OpenID Connect login.
Login mask with OpenID Connect enabled
Local authentication
No configuration is required for local authentication.
Usage
You can add an user on the command line using the flag -add-user:
Roles can be admin, support, manager, api, and user.
Users can be deleted using the flag -del-user:
./cc-backend -del-user fritz
Warning
The option -del-user as currently implemented will delete ALL users that
match the username independent of its origin. This means it will also delete
user records that were added from LDAP or JWT tokens.
JWT token authentication
JSON web tokens are a standardized method for representing encoded
claims securely between two parties. In ClusterCockpit they are used for
authorization to use REST APIs as well as a method to delegate authentication to
a third party. This section only describes JWT based authentication for
initiating a user session.
Two variants exist:
[1] Session Authenticator: Passes JWT token in the HTTP header Authorization
using the Bearer prefix or using the query key login-token.
Example for Authorization header:
Authorization: Bearer S0VLU0UhIExFQ0tFUiEK
Example for query key used as form action in external application:
[2] Cookie Session Authenticator: Reads the JWT token from a named cookie provided by the request,
which is deleted after the session was successfully initiated. This is a more secure
alternative to the standard header based solution.
JWT Configuration
[0] Basic required configuration:
In order to enable JWT based transactions generally, the following has to be true:
The jwts JSON object has to exist within config.json, even if no other attribute is set within.
We recommend to set max-age attribute: Specifies for how long a JWT token shall be valid, defined as a string parsable by time.ParseDuration().
This will only affect JWTs generated by ClusterCockpit, e.g. for the use with REST-API endpoints.
In addition, the the following environment variables are used:
JWT_PRIVATE_KEY: The applications own private key to be used with JWT transactions. Required for cookie based logins and REST-API communication.
JWT_PUBLIC: The applications own public key to be used with JWT transactions. Required for cookie based logins and REST-API communication.
[1] Configuration for JWT Session Authenticator:
Compatible signing methods are: HS256, HS512
Only a shared (symmetric) key saved as environment variable CROSS_LOGIN_JWT_HS512_KEY is required.
[2] Configuration for JWT Cookie Session Authenticator:
Tokens are signed with: Ed25519/EdDSA
To enable JWT authentication via cookie the following set of options are required as attributes of the jwts JSON object:
cookieName (String): Specifies which cookie should be checked for a JWT token (if no authorization header is present)
trustedIssuer (String): Specifies which issuer should be accepted when validating external JWTs (iss-claim)
In addition, the Cookie Session Authenticator method requires the following environment variable:
CROSS_LOGIN_JWT_PUBLIC_KEY: Primary public key for this method, validates identity of tokens received from trustedIssuer and must therefore match accordingly.
[3] Optional configuration attributes of the jwts JSON object, valid for both [1] and [2], are:
validateUser (Bool): Load user by username encoded in sub-claim from database, including roles, denying login if not matched in database. Ignores all other claims. By design not combinable with both syncUserOnLogin and/or updateUserOnLogin options.
syncUserOnLogin (Bool): If user encoded in token does not exist in database, add a new user entry. Does not update user on recurring JWT logins.
updateUserOnLogin (Bool): If user encoded in token does exist in database, update the user entry with all encoded information. Does not add users on first-time JWT login.
JWT Usage
[1] Usage for JWT Session Authenticator:
The endpoint for initiating JWT logins in ClusterCockpit is /jwt-login
For login with JWT Header, the header has to include the Authorization: Bearer $TOKEN information when accessing this endpoint.
For login with JWT request parameter, the external website has to submit an action with the parameter ?login-token=$TOKEN (See example above).
In both cases, the JWT should contain the following parameters:
sub: The subject, in this case this is the username. Will be used for user matching if validateUser is set.
exp: Expiration in Unix epoch time. Can be small as the token is only used during login.
name: The full name of the person assigned to this account. Will be used to update user table.
roles: String array with roles of user.
projects: [Optional] String array with projects of user. Relevant if user has manager-role.
[2] Usage for JWT Cookie Session Authenticator:
The token must be set within a cookie with a name matching the configured cookieName.
The JWT should then contain the following parameters:
sub: The subject, in this case this is the username. Will be used for user matching if validateUser is set.
exp: Expiration in Unix epoch time. Can be small as the token is only used during login.
name: The full name of the person assigned to this account. Will be used to update user table.
roles: String array with roles of user.
Authorization control
cc-backend uses roles to decide if a user is authorized to access certain
information. The roles and their rights are described in more detail here.
6 - Job Archive Handbook
All you need to know about the ClusterCockpit Job Archive
The job archive specifies an exchange format for job meta and performance metric
data. It consists of two parts:
By using an open, portable and simple specification based on JSON objects it is
possible to exchange job performance data for research and analysis purposes as
well as use it as a robust way for archiving job performance data.
The current release supports new SQLite and S3 object store based job archive
backends. Those are still experimental and for production we still recommend to
use the proven file based job archive. One major disadvantage of the file based
job archive backend is that for large job counts it will consume a lot of
inodes.
Trying the new job-archive backends
We provide the tool archive-manager that allows to convert between different
job-archive formats. This allows to convert your existing file-based job-archive
into either a SQLite or S3 variant. Please be aware that for large archives this
may take a long time. You can find details about how to use this tool in the
archive-manager reference
documentation.
Specification for file path / key
To manage the number of directories within a single directory a tree approach is
used splitting the integer job ID. The job id is split in junks of 1000 each.
Usually 2 layers of directories is sufficient but the concept can be used for an
arbitrary number of layers.
For a 2 layer schema this can be achieved with (code example in Perl):
While for the SQLite and S3 object store based backend the systematic to
introduce layers is obsolete we kept it to keep the naming consistent. This
means what is the path in case of the file based backend is used as a object key
and column value there.
Example
For the job ID 1034871 on cluster large with start time 1768978339 the key
is ./large/1034/871/1768978339.
Create a Job archive from scratch
In case you place the job-archive in the ./var folder create the folder with:
mkdir -p ./var/job-archive
The job-archive is versioned, the current version is documented in the Release
Notes. Currently you have to create the version file manually when initializing the
job-archive:
echo3 > ./var/job-archive/version.txt
Directory layout
ClusterCockpit supports multiple clusters, for each cluster you need to create a
directory named after the cluster and a cluster.json file specifying the metric
list and hardware partitions within the clusters. Hardware partitions are
subsets of a cluster with homogeneous hardware (CPU type, memory capacity, GPUs)
that are called subclusters in ClusterCockpit.
For above configuration the job archive directory hierarchy looks like the
following:
Every cluster must be configured in a cluster.json file.
The job data consists of two files:
meta.json: Contains job meta information and job statistics.
data.json: Contains complete job data with time series
The description of the json format specification is available as [[json
schema|https://json-schema.org/]] format file. The latest version of the json
schema is part of the cc-backend source tree. For external reference it is
also available in a separate repository.
Specification cluster.json
The json schema specification in its raw format is available at the
cc-lib GitHub repository.
A variant rendered for better readability is found in the references.
Specification meta.json
The json schema specification in its raw format is available at the
cc-lib GitHub repository.
A variant rendered for better readability is found in the references.
Specification data.json
The json schema specification in its raw format is available at the
cc-lib GitHub repository.
A variant rendered for better readability is found in the references.
Metric time series data is stored for a fixed time step. The time step is set
per metric. If no value is available for a metric time series data timestamp
null is entered.
Changes to the original JSON schemas found in the repository are not automatically rendered in this reference documentation.The raw JSON schemas are parsed and rendered for better readability using the json-schema-for-humans utility.Last Update: 04.12.2024
Description: Specifies for how long a session shall be valid as a string parsable by time.ParseDuration(). If 0 or empty, the session/token does not expire!
Description: The unique identifier of a sub cluster
6. Property Job meta data > partition
Type
string
Required
No
Description: The Slurm partition to which the job was submitted
7. Property Job meta data > arrayJobId
Type
integer
Required
No
Description: The unique identifier of an array job
8. Property Job meta data > numNodes
Type
integer
Required
Yes
Description: Number of nodes used
Restrictions
Minimum
> 0
9. Property Job meta data > numHwthreads
Type
integer
Required
No
Description: Number of HWThreads used
Restrictions
Minimum
> 0
10. Property Job meta data > numAcc
Type
integer
Required
No
Description: Number of accelerators used
Restrictions
Minimum
> 0
11. Property Job meta data > exclusive
Type
integer
Required
Yes
Description: Specifies how nodes are shared. 0 - Shared among multiple jobs of multiple users, 1 - Job exclusive, 2 - Shared among multiple jobs of same user
Restrictions
Minimum
≥ 0
Maximum
≤ 2
12. Property Job meta data > monitoringStatus
Type
integer
Required
No
Description: State of monitoring system during job run
gen-keypair: Generate Ed25519 keypairs for JWT signing and validation
convert-pem-pubkey: Convert external Ed25519 PEM keys to ClusterCockpit format
Diagnostics
grepCCLog.pl: Analyze log files to identify non-archived jobs
Data Generation for cc-metric-store
dataGenerator.sh: Connect to cc-metric-store (external or internal) and push data at 1 minute interval.
Building Tools
All Go-based tools follow the same build pattern:
cd tools/<tool-name>
go build
Common Features
Most tools support:
Configurable logging levels (-loglevel)
Timestamped log output (-logdate)
Configuration file specification (-config)
8.1 - archive-manager
Job Archive Management Tool
The archive-manager tool provides comprehensive management and maintenance capabilities for ClusterCockpit job archives. It supports validation, cleaning, importing between different archive backends, and general archive operations.
Build
cd tools/archive-manager
go build
Command-Line Options
-s <path>
Function: Specify the source job archive path.
Default:./var/job-archive
Example:-s /data/job-archive
-config <path>
Function: Specify alternative path to config.json.
Default:./config.json
Example:-config /etc/clustercockpit/config.json
-validate
Function: Validate a job archive against the JSON schema.
-remove-cluster <cluster>
Function: Remove specified cluster from archive and database.
Example:-remove-cluster oldcluster
-remove-before <date>
Function: Remove all jobs with start time before the specified date.
Format:2006-Jan-04
Example:-remove-before 2023-Jan-01
-remove-after <date>
Function: Remove all jobs with start time after the specified date.
Format:2006-Jan-04
Example:-remove-after 2024-Dec-31
-import
Function: Import jobs from source archive to destination archive.
Note: Requires -src-config and -dst-config options.
-src-config <json>
Function: Source archive backend configuration in JSON format.
Validation: Verify job archive integrity against JSON schemas
Cleaning: Remove jobs by date range or cluster
Import/Export: Transfer jobs between different archive backend types
Statistics: Display archive information and job counts
Progress Tracking: Real-time progress reporting for long operations
8.2 - archive-migration
Job Archive Schema Migration Tool
The archive-migration tool migrates job archives from old schema versions to the current schema version. It handles schema changes such as the exclusive → shared field transformation and adds/removes fields as needed.
Features
Parallel Processing: Uses worker pool for fast migration
Dry-Run Mode: Preview changes without modifying files
Always backup your archive before running migration!
The tool modifies meta.json files in place. While transformations are designed to be safe, unexpected issues could occur. Follow these safety practices:
Always run with --dry-run first to preview changes
Backup your archive before migration
Test on a copy of your archive first
Verify results after migration
Verification
After migration, verify the archive:
# Use archive-manager to check the archivecd ../archive-manager
./archive-manager -s /data/migrated-archive
# Or validate specific jobs./archive-manager -s /data/migrated-archive --validate
Troubleshooting
Migration Failures
If individual jobs fail to migrate:
Check the error messages for specific files
Examine the failing meta.json files manually
Fix invalid JSON or unexpected field types
Re-run migration (already-migrated jobs will be processed again)
Performance
For large archives:
Increase --workers for more parallelism
Use --loglevel warn to reduce log output
Monitor disk I/O if migration is slow
Technical Details
The migration process:
Walks archive directory recursively
Finds all meta.json files
Distributes jobs to worker pool
For each job:
Reads JSON file
Applies transformations in order
Writes back migrated data (if not dry-run)
Reports statistics and errors
Transformations are idempotent - running migration multiple times is safe (though not recommended for performance).
8.3 - convert-pem-pubkey
Convert Ed25519 Public Key from PEM to ClusterCockpit Format
The convert-pem-pubkey tool converts an Ed25519 public key from PEM format to the base64 format used by ClusterCockpit for JWT validation.
Use Case
When you have externally generated JSON Web Tokens (JWT) that should be accepted by cc-backend, the external provider shares its public key (used for JWT signing) in PEM format. ClusterCockpit requires this key in a different format, which this tool provides.
Build
cd tools/convert-pem-pubkey
go build
Usage
Input Format (PEM)
-----BEGIN PUBLIC KEY-----
MCowBQYDK2VwAyEA+51iXX8BdLFocrppRxIw52xCOf8xFSH/eNilN5IHVGc=
-----END PUBLIC KEY-----
Convert Key
# Insert your public Ed25519 PEM key into dummy.pubecho"-----BEGIN PUBLIC KEY-----
MCowBQYDK2VwAyEA+51iXX8BdLFocrppRxIw52xCOf8xFSH/eNilN5IHVGc=
-----END PUBLIC KEY-----" > dummy.pub
# Run conversiongo run . dummy.pub
ClusterCockpit can now validate JWTs from the external provider
Command-Line Arguments
convert-pem-pubkey <pem-file>
Arguments: Path to PEM-encoded Ed25519 public key file
Example:go run . dummy.pub
Example Workflow
# 1. Navigate to tool directorycd tools/convert-pem-pubkey
# 2. Save external provider's PEM keycat > external-key.pub <<EOF
-----BEGIN PUBLIC KEY-----
MCowBQYDK2VwAyEA+51iXX8BdLFocrppRxIw52xCOf8xFSH/eNilN5IHVGc=
-----END PUBLIC KEY-----
EOF# 3. Convert to ClusterCockpit formatgo run . external-key.pub
# 4. Add output to .env file# CROSS_LOGIN_JWT_PUBLIC_KEY="+51iXX8BdLFocrppRxIw52xCOf8xFSH/eNilN5IHVGc="# 5. Restart cc-backend
Technical Details
The tool:
Reads Ed25519 public key in PEM format
Extracts the raw key bytes
Encodes to base64 string
Outputs in ClusterCockpit’s expected format
This enables ClusterCockpit to validate JWTs signed by external providers using their Ed25519 keys.
8.4 - gen-keypair
Generate Ed25519 Keypair for JWT Signing
The gen-keypair tool generates a new Ed25519 keypair for signing and validating JWT tokens in ClusterCockpit.
Purpose
Generates a cryptographically secure Ed25519 public/private keypair that can be used for:
JWT token signing (private key)
JWT token validation (public key)
Build
cd tools/gen-keypair
go build
Usage
go run .
Or after building:
./gen-keypair
Output
The tool outputs a keypair in base64-encoded format:
ED25519 PUBLIC_KEY="<base64-encoded-public-key>"
ED25519 PRIVATE_KEY="<base64-encoded-private-key>"
This is NO JWT token. You can generate JWT tokens with cc-backend. Use this keypair for signing and validation of JWT tokens in ClusterCockpit.
Configuration
Add the generated keys to ClusterCockpit’s configuration:
# 1. Generate keypaircd tools/gen-keypair
go run . > keypair.txt
# 2. View generated keyscat keypair.txt
# 3. Add to .env file (manual or scripted)grep PUBLIC_KEY keypair.txt >> ../../.env
grep PRIVATE_KEY keypair.txt >> ../../.env
# 4. Restart cc-backend to use new keys
Security Notes
The private key must be kept secret
Store private keys securely (file permissions, encryption at rest)
Use environment variables or secure configuration management
Do not commit private keys to version control
Rotate keys periodically for enhanced security
Technical Details
The tool uses:
Go’s crypto/ed25519 package
/dev/urandom as entropy source on Linux
Base64 standard encoding for output format
Ed25519 provides:
Fast signature generation and verification
Small key and signature sizes
Strong security guarantees
8.5 - grepCCLog.pl
Analyze ClusterCockpit Log Files for Running Jobs
The grepCCLog.pl script analyzes ClusterCockpit log files to identify jobs that were started but not yet archived on a specific day. This is useful for troubleshooting and monitoring job lifecycle.
Purpose
Parses ClusterCockpit log files to:
Identify jobs that started on a specific day
Detect jobs that have not been archived
Generate statistics per user
Report jobs that may be stuck or still running
Usage
./grepCCLog.pl <logfile> <day>
Arguments
<logfile>
Function: Path to ClusterCockpit log file
Example:/var/log/clustercockpit/cc-backend.log
<day>
Function: Day of month to analyze (numeric)
Example:15 (for October 15th)
Output
The script produces:
List of Non-Archived Jobs: Details for each job that started but hasn’t been archived
Per-User Summary: Count of non-archived jobs per user
Total Statistics: Overall count of started vs. non-archived jobs
Example Output
======
jobID: 12345 User: alice
======
======
jobID: 12346 User: bob
======
alice => 1
bob => 1
Not stopped: 2 of 10
Log Format Requirements
The script expects log entries in the following format:
Job Start Entry
Oct 15 ... new job (id: 123): cluster=woody, jobId=12345, user=alice, ...
# Line 19: Change cluster nameif($clustereq'your-cluster-name'&&$dayeq$Tday){# Line 35: Change cluster name for archive matchingif($clustereq'your-cluster-name'){# Lines 12 & 28: Update month patternif( /Oct ([0-9]+) .../){# Change 'Oct' to your desired month
Use Cases
Debugging: Identify jobs that failed to archive properly
Monitoring: Track running jobs for a specific day
Troubleshooting: Find stuck jobs in the system
Auditing: Verify job lifecycle completion
Example Workflow
# Analyze today's jobs (e.g., October 15)./grepCCLog.pl /var/log/cc-backend.log 15# Find jobs started on the 20th./grepCCLog.pl /var/log/cc-backend.log 20# Check specific log file./grepCCLog.pl /path/to/old-logs/cc-backend-2024-10.log 15
Technical Details
The script:
Opens specified log file
Parses log entries with regex patterns
Tracks started jobs in hash table
Tracks archived jobs in separate hash table
Compares to find jobs without archive entry
Aggregates statistics per user
Outputs results
Jobs are matched by database ID (id: field) between start and archive entries.
8.6 - Metric Generator Script
Overview
The Metric Generator is a bash script designed to simulate high-frequency metric data for the alex and fritz clusters. It is primarily used for testing the connection to cc-metric-store and put dummy data into it. This can either be your separately hoster cc-metric-store (which is what we call external mode) or your integrated cc-metric-store into cc-backend (which is what we call internal cc-metric-store).
The script supports two transport mechanisms:
REST API (via curl)
NATS Messaging (via nats-cli)
It also supports two deployment scopes to handle different URL structures and authentication methods:
Internal (Integrated cc-metric-store into cc-backend)
External (Self-hosted separate cc-metric-store)
Configuration
The script behavior is controlled by variables defined at the top of the file.
Main Operation Flags
Variable
Options
Description
TRANSPORT_MODE
"REST" / "NATS"
REST: Sends HTTP POST requests. NATS: Publishes to a NATS subject.
CONNECTION_SCOPE
"INTERNAL" / "EXTERNAL"
INTERNAL: To use integrated cc-metric-store. EXTERNAL: To use self-hosted separate cc-metric-store.
API_USER
String (e.g., "demo")
The username used to generate the JWT when in INTERNAL mode.
Network Settings
Variable
Description
Required Mode
SERVICE_ADDRESS
Base URL of the API (e.g., http://localhost:8080).