This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Reference

In-depth technical documentation

In-depth description of configuration options, file formats, and REST API interfaces.

1 - Backend

ClusterCockpit Backend References

Reference information regarding the primary ClusterCockpit component “cc-backend” (GitHub Repo).

1.1 - Command Line

ClusterCockpit Command Line Options

This page describes the command line options for the cc-backend executable.


-add-user <username>:[admin,support,manager,api,user]:<password>

Function: Adds a new user to the database. Only one role can be assigned.

Example: -add-user abcduser:manager:somepass


  -config <path>

Function: Specifies alternative path to application configuration file.

Default: ./config.json

Example: -config ./configfiles/configuration.json


  -del-user <username>

Function: Removes a user from the database by username.

Example: -del-user abcduser


  -dev

Function: Enables development components: GraphQL Playground and Swagger UI.


  -gops

Function: Go server listens via github.com/google/gops/agent (for debugging).


  -import-job <path-to-meta.json>:<path-to-data.json>, ...

Function: Import one or more jobs by comma seperated list of paths to meta.json and data.json.

Example: -import-job ./to-import/job1-meta.json:./to-import/job1-data.json,./to-import/job2-meta.json:./to-import/job2-data.json


  -init

Function: Setups var directory. Initializes sqlite database file, config.json and .env environment variable file.


  -init-db

Function: Iterates the job-archive and re-initializes the ‘job’, ’tag’, and ‘jobtag’ tables based on archived jobs.


  -jwt <username>

Function: Generates and prints a JWT for the user specified by its username.

Example: -jwt abcduser


  -logdate

Function: Set this flag to add date and time to log messages.


  -loglevel <level>

Function: Sets the loglevel of the running ClusterCockpit instance. “Debug” will print all levels, “Crit” will only log critical log messages.

Arguments: debug | info | warn | err | crit

Default: info

Example: -loglevel debug


  -migrate-db

Function: Migrate database to latest supported version and exit.


  -server

Function: Start a server, continues listening on configured port (Default: :8080) after initialization and argument handling.


  -sync-ldap

Function: Synchronizes the ‘user’ table with LDAP.


  -version

Function: Shows version information and exits.

1.2 - Configuration

ClusterCockpit Configuration Option References

CC-Backend requires a JSON configuration file that specifies the cluster systems to be used. The schema of the configuration is described at the schema documentation.

To override the default, specify the location of a JSON configuration file with the -config <file path> command line option.

Configuration Options

  • addr: Type string. Address where the http (or https) server will listen on (for example: ’localhost:80’). Default :8080.
  • apiAllowedIPs: Type array [string]. Addresses from which the secured API endpoints (/users and other auth related endpoints) can be reached
  • user: Type string. Drop root permissions once .env was read and the port was taken. Only applicable if using privileged port.
  • group: Type string. Drop root permissions once .env was read and the port was taken. Only applicable if using privileged port.
  • disable-authentication: Type bool. Disable authentication (for everything: API, Web-UI, …). Default false.
  • embed-static-files: Type bool. If all files in web/frontend/public should be served from within the binary itself (they are embedded) or not. Default true.
  • static-files: Type string. Folder where static assets can be found, if embed-static-files is false. No default.
  • db-driver: Type string. ‘sqlite3’ or ‘mysql’ (mysql will work for mariadb as well). Default sqlite3.
  • db: Type string. For sqlite3 a filename, for mysql a DSN in this format, without query parameters. Default: ./var/job.db.
  • job-archive: Type object.
    • kind: Type string. At them moment only file is supported as value.
    • path: Type string. Path to the job-archive. Default: ./var/job-archive.
    • compression: Type integer. Setup automatic compression for jobs older than number of days.
    • retention: Type object.
      • policy: Type string (required). Retention policy. Possible values none, delete, move.
      • includeDB: Type bool. Also remove jobs from database.
      • age: Type integer. Act on jobs with startTime older than age (in days).
      • location: Type string. The target directory for retention. Only applicable for retention policy move.
  • disable-archive: Type bool. Keep all metric data in the metric data repositories, do not write to the job-archive. Default false.
  • validate: Type bool. Validate all input json documents against json schema.
  • ldap: Type object. For LDAP Authentication and user synchronisation. Default nil.
    • url: Type string (required). URL of LDAP directory server.
    • user_base: Type string (required). Base DN of user tree root.
    • search_dn: Type string (required). DN for authenticating LDAP admin account with general read rights.
    • user_bind: Type string (required). Expression used to authenticate users via LDAP bind. Must contain uid={username}.
    • user_filter: Type string (required). Filter to extract users for syncing.
    • username_attr: Type string. Attribute with full user name. Defaults to gecos if not provided.
    • sync_interval: Type string. Interval used for syncing local user table with LDAP directory. Parsed using time.ParseDuration.
    • sync_del_old_users: Type bool. Delete obsolete users in database.
    • syncUserOnLogin: Type bool. Add non-existent user to DB at login attempt if user exists in Ldap directory.
  • jwts: Type object (required). For JWT Authentication.
    • max-age: Type string (required). Configure how long a token is valid. As string parsable by time.ParseDuration().
    • cookieName: Type string. Cookie that should be checked for a JWT token.
    • vaidateUser: Type bool. Deny login for users not in database (but defined in JWT). Overwrite roles in JWT with database roles.
    • trustedIssuer: Type string. Issuer that should be accepted when validating external JWTs.
    • syncUserOnLogin: Type bool. Add non-existent user to DB at login attempt with values provided in JWT.
    • updateUserOnLogin: Type bool. Update existent user in DB at login attempt with values provided in JWT. Currently only the person name is updated.
  • oidc: Type object. Default nil.
    • provider: Type string.
    • syncUserOnLogin: Type bool. Add non-existent user to DB at login attempt with values provided in JWT.
    • updateUserOnLogin: Type bool. Update existent user in DB at login attempt with values provided in JWT. Currently only the person name is updated.
  • session-max-age: Type string. Specifies for how long a session shall be valid as a string parsable by time.ParseDuration(). If 0 or empty, the session/token does not expire! Default 168h.
  • https-cert-file and https-key-file: Type string. If both those options are not empty, use HTTPS using those certificates.
  • redirect-http-to: Type string. If not the empty string and addr does not end in “:80”, redirect every request incoming at port 80 to that url.
  • ui-defaults: Type object. Default configuration for webinterface views. Most options can be overwritten by the user via the web interface. See below for details.
  • enable-resampling: Type object. If configured, will enable dynamic zoom in frontend metric plots using the configured values.
    • resolutions: Type array [integer]. Array of resampling target resolutions, in seconds; Example: [600,300,60].
    • trigger: Type integer. Trigger next zoom level at less than this many visible datapoints.
  • machine-state-dir: Type string. Where to store MachineState files. TODO: Explain in more detail!
  • stop-jobs-exceeding-walltime: Type int. If not zero, automatically mark jobs as stopped running X seconds longer than their walltime. Only applies if walltime is set for job. Default 0.
  • short-running-jobs-duration: Type int. Do not show running jobs shorter than X seconds. Default 300.
  • emission-constant: Type integer. Energy Mix CO2 Emission Constant [g/kWh]. If entered, displays estimated CO2 emission for job based on jobs’ totalEnergy.
  • cron-frequency: Type object. Defines frequency of cron job workers.
    • duration-worker: Type string. Default: 5m
    • footprint-worker: Type string. Default: 10m
  • clusters: Type array [object] (required). Array of clusters.
    • name: Type string. The name of the cluster.
    • metricDataRepository: Type object.
      • kind: Type string. Can be one of [cc-metric-store, influxdb].
      • url: Type string.
      • token: Type string.
    • filterRanges Type object. This option controls the slider ranges for the UI controls of numNodes, duration, and startTime. Example:
"filterRanges": {
               "numNodes": { "from": 1, "to": 64 },
               "duration": { "from": 0, "to": 86400 },
               "startTime": { "from": "2022-01-01T00:00:00Z", "to": null }
         }

UI Default Object Fields

  • analysis_view_histogramMetrics: Type array [string]. Metrics to show as job count histograms in analysis view. Default ["flops_any", "mem_bw", "mem_used"].
  • analysis_view_scatterPlotMetrics: Type array of string array. Initial scatter plot configuration in analysis view. Default [["flops_any", "mem_bw"], ["flops_any", "cpu_load"], ["cpu_load", "mem_bw"]].
  • job_view_nodestats_selectedMetrics: Type array [string]. Initial metrics shown in node statistics table of single job view. Default ["flops_any", "mem_bw", "mem_used"].
  • job_view_selectedMetrics: Type array [string]. Default ["flops_any", "mem_bw", "mem_used"].
  • plot_general_colorBackground: Type bool. Color plot background according to job average threshold limits. Default true.
  • plot_general_colorscheme: Type array [string]. Initial color scheme. Default "#00bfff", "#0000ff", "#ff00ff", "#ff0000", "#ff8000", "#ffff00", "#80ff00".
  • plot_general_lineWidth: Type int. Initial linewidth. Default 3.
  • plot_list_jobsPerPage: Type int. Jobs shown per page in job lists. Default 50.
  • plot_list_selectedMetrics: Type array [string]. Initial metric plots shown in jobs lists. Default "cpu_load", "ipc", "mem_used", "flops_any", "mem_bw".
  • plot_view_plotsPerRow: Type int. Number of plots per row in single job view. Default 3.
  • plot_view_showPolarplot: Type bool. Option to toggle polar plot in single job view. Default true.
  • plot_view_showRoofline: Type bool. Option to toggle roofline plot in single job view. Default true.
  • plot_view_showStatTable: Type bool. Option to toggle the node statistic table in single job view. Default true.
  • system_view_selectedMetric: Type string. Initial metric shown in system view. Default cpu_load.

Some of the ui-defaults values can be appended by :<clustername> in order to have different settings depending on the current cluster. Those are notably job_view_nodestats_selectedMetrics, job_view_selectedMetrics and plot_list_selectedMetrics.

1.3 - Environment

ClusterCockpit Environment Variables

All security-related configurations, e.g. keys and passwords, are set using environment variables. It is supported to set these by means of a .env file in the project root.

Environment Variables

An example env file is found in this directory. Copy it as .env into the project root and adapt it for your needs.

  • JWT_PUBLIC_KEY and JWT_PRIVATE_KEY: Base64 encoded Ed25519 keys used for JSON Web Token (JWT) authentication. You can generate your own keypair using go run ./tools/gen-keypair/. The release binaries also include the gen-keypair tool for x86-64. For more information, see the JWT documentation.
  • SESSION_KEY: Some random bytes used as secret for cookie-based sessions.
  • LDAP_ADMIN_PASSWORD: The LDAP admin user password (optional).
  • CROSS_LOGIN_JWT_HS512_KEY: Used for token based logins via another authentication service.
  • LOGLEVEL: Can be crit, err, warn, info or debug. Can be used to reduce logging. Default is info.

1.4 - REST API

ClusterCockpit RESTful API Endpoint Reference

REST API Authorization

In ClusterCockpit JWTs are signed using a public/private key pair using ED25519. Because tokens are signed using public/private key pairs, the signature also certifies that only the party holding the private key is the one that signed it. JWT tokens in ClusterCockpit are not encrypted, means all information is clear text. Expiration of the generated tokens can be configured in config.json using the max-age option in the jwts object. Example:

"jwts": {
    "max-age": "168h"
},

The party that generates and signs JWT tokens has to be in possession of the private key and any party that accepts JWT tokens must possess the public key to validate it. cc-backed therefore requires both keys, the private one to sign generated tokens and the public key to validate tokens that are provided by REST API clients.

Generate ED25519 key pairs

Usage of Swagger UI

To use the Swagger UI for testing you have to run an instance of cc-backend on localhost (and use the default port 8080):

./cc-backend -server

You may want to start the demo as described here . This Swagger UI is also available as part of cc-backend if you start it with the dev option:

./cc-backend -server -dev

You may access it at this URL.

Swagger API Reference

1.5 - Authentication Handbook

How to configure and use the authentication backends

Introduction

cc-backend supports the following authentication methods:

  • Local login with credentials stored in SQL database
  • Login with authentication to a LDAP directory
  • Authentication via JSON Web Token (JWT):
    • With token provided in HTML request header
    • With token provided in cookie
  • Login via OpenID Connect (against a KeyCloak instance)

All above methods create a session cookie that is then used for subsequent authentication of requests. Multiple authentication methods can be configured at the same time. If LDAP is enabled it takes precedence over local authentication. The OpenID Connect method against a KeyCloak instance enables many more authentication methods using the ability of KeyCloak to act as an Identity Broker.

The REST API uses stateless authentication via a JWT token, which means that every requests must be authenticated.

General configuration options

All configuration is part of the cc-backend configuration file config.json. All security sensitive options as passwords and tokens are passed in terms of environment variables. cc-backend supports to read an .env file upon startup and set the environment variables contained there.

Duration of session

Per default the maximum duration of a session is 7 days. To change this the option session-max-age has to be set to a string that can be parsed by the Golang time.ParseDuration() function. For most use cases the largest unit h is the only relevant option. Example:

"session-max-age": "24h",

To enable unlimited session duration set session-max-age either to 0 or empty string.

LDAP authentication

Configuration

To enable LDAP authentication the following set of options are required as attributes of the ldap JSON object:

  • url: URL of the LDAP directory server. This must be a complete URL including the protocol and not only the host name. Example: ldaps://ldsrv.mydomain.com.
  • user_base: Base DN of user tree root. Example: ou=people,ou=users,dc=rz,dc=mydomain,dc=com.
  • search_dn: DN for authenticating an LDAP admin account with general read rights. This is required for the sync on login and the sync options. Example: cn=monitoring,ou=adm,ou=profile,ou=manager,dc=rz,dc=mydomain,dc=com
  • user_bind: Expression used to authenticate users via LDAP bind. Must contain uid={username}. Example: uid={username},ou=people,ou=users,dc=rz,dc=mydomain,dc=com.
  • user_filter: Filter to extract users for syncing. Example: (&(objectclass=posixAccount)).

Optional configuration options are:

  • username_attr: Attribute with full user name. Defaults to gecos if not provided.
  • sync_interval: Interval used for syncing SQL user table with LDAP directory. Parsed using time.ParseDuration. The sync interval is always relative to the time cc-backend was started. Example: 24h.
  • sync_del_old_users: Type boolean. Delete users in SQL database if not in LDAP directory anymore. This of course only applies to users that were added from LDAP.
  • syncUserOnLogin: Type boolean. Add non-existent user to DB at login attempt if user exists in LDAP directory. This option enables that users can login at once after they are added to the LDAP directory.

The LDAP authentication method requires the environment variable LDAP_ADMIN_PASSWORD for the search_dn account that is used to sync users.

Usage

If LDAP is configured it is the first authentication method that is tried if a user logs in using the login form. A sync with the LDAP directory can also be triggered from the command line using the flag -sync-ldap.

Local authentication

No configuration is required for local authentication.

Usage

You can add an user on the command line using the flag -add-user:

./cc-backend -add-user <username>:<roles>:<password>

Example:

./cc-backend -add-user fritz:admin,api:myPass

Roles can be admin, support, manager, api, and user.

Users can be deleted using the flag -del-user:

./cc-backend -del-user fritz

JWT token authentication

JSON web tokens are a standardized method for representing claims securely between two parties. In ClusterCockpit they are used for authorization to use REST APIs as well as a method to delegate authentication to a third party.

Configuration

Authorization control

cc-backend uses roles to decide if a user is authorized to access certain information. The roles and their rights are described in more detail here.

1.6 - Job Archive Handbook

All you need to know about the ClusterCockpit Job Archive

1.7 - Schemas

ClusterCockpit Schema References

ClusterCockpit Schema References for

  • Application Configuration
  • Cluster Configuration
  • Job Data
  • Job Statistics
  • Units
  • Job Archive Job Metadata
  • Job Archive Job Metricdata

The schemas in their raw form can be found in the ClusterCockpit GitHub repository.

1.7.1 - Application Config Schema

ClusterCockpit Application Config Schema Reference

A detailed description of each of the application configuration options can be found in the config documentation.

The following schema in its raw form can be found in the ClusterCockpit GitHub repository.

cc-backend configuration file schema

Title: cc-backend configuration file schema

Typeobject
RequiredNo
Additional propertiesAny type allowed
PropertyPatternTypeDeprecatedDefinitionTitle/Description
- addrNostringNo-Address where the http (or https) server will listen on (for example: ’localhost:80’).
- apiAllowedIPsNoarray of stringNo-Addresses from which secured API endpoints can be reached
- userNostringNo-Drop root permissions once .env was read and the port was taken. Only applicable if using privileged port.
- groupNostringNo-Drop root permissions once .env was read and the port was taken. Only applicable if using privileged port.
- disable-authenticationNobooleanNo-Disable authentication (for everything: API, Web-UI, …).
- embed-static-filesNobooleanNo-If all files in `web/frontend/public` should be served from within the binary itself (they are embedded) or not.
- static-filesNostringNo-Folder where static assets can be found, if embed-static-files is false.
- db-driverNoenum (of string)No-sqlite3 or mysql (mysql will work for mariadb as well).
- dbNostringNo-For sqlite3 a filename, for mysql a DSN in this format: https://github.com/go-sql-driver/mysql#dsn-data-source-name (Without query parameters!).
- archiveNoobjectNo-Configuration keys for job-archive
- disable-archiveNobooleanNo-Keep all metric data in the metric data repositories, do not write to the job-archive.
- validateNobooleanNo-Validate all input json documents against json schema.
- session-max-ageNostringNo-Specifies for how long a session shall be valid as a string parsable by time.ParseDuration(). If 0 or empty, the session/token does not expire!
- https-cert-fileNostringNo-Filepath to SSL certificate. If also https-key-file is set use HTTPS using those certificates.
- https-key-fileNostringNo-Filepath to SSL key file. If also https-cert-file is set use HTTPS using those certificates.
- redirect-http-toNostringNo-If not the empty string and addr does not end in :80, redirect every request incoming at port 80 to that url.
- stop-jobs-exceeding-walltimeNointegerNo-If not zero, automatically mark jobs as stopped running X seconds longer than their walltime. Only applies if walltime is set for job.
- short-running-jobs-durationNointegerNo-Do not show running jobs shorter than X seconds.
- emission-constantNointegerNo-.
- cron-frequencyNoobjectNo-Frequency of cron job workers.
- enable-resamplingNoobjectNo-Enable dynamic zoom in frontend metric plots.
+ jwtsNoobjectNo-For JWT token authentication.
- oidcNoobjectNo--
- ldapNoobjectNo-For LDAP Authentication and user synchronisation.
+ clustersNoarray of objectNo-Configuration for the clusters to be displayed.
- ui-defaultsNoobjectNo-Default configuration for web UI

1. Property cc-backend configuration file schema > addr

Typestring
RequiredNo

Description: Address where the http (or https) server will listen on (for example: ’localhost:80’).

2. Property cc-backend configuration file schema > apiAllowedIPs

Typearray of string
RequiredNo

Description: Addresses from which secured API endpoints can be reached

Array restrictions
Min itemsN/A
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
apiAllowedIPs items-

2.1. cc-backend configuration file schema > apiAllowedIPs > apiAllowedIPs items

Typestring
RequiredNo

3. Property cc-backend configuration file schema > user

Typestring
RequiredNo

Description: Drop root permissions once .env was read and the port was taken. Only applicable if using privileged port.

4. Property cc-backend configuration file schema > group

Typestring
RequiredNo

Description: Drop root permissions once .env was read and the port was taken. Only applicable if using privileged port.

5. Property cc-backend configuration file schema > disable-authentication

Typeboolean
RequiredNo

Description: Disable authentication (for everything: API, Web-UI, …).

6. Property cc-backend configuration file schema > embed-static-files

Typeboolean
RequiredNo

Description: If all files in web/frontend/public should be served from within the binary itself (they are embedded) or not.

7. Property cc-backend configuration file schema > static-files

Typestring
RequiredNo

Description: Folder where static assets can be found, if embed-static-files is false.

8. Property cc-backend configuration file schema > db-driver

Typeenum (of string)
RequiredNo

Description: sqlite3 or mysql (mysql will work for mariadb as well).

Must be one of:

  • “sqlite3”
  • “mysql”

9. Property cc-backend configuration file schema > db

Typestring
RequiredNo

Description: For sqlite3 a filename, for mysql a DSN in this format: https://github.com/go-sql-driver/mysql#dsn-data-source-name (Without query parameters!).

10. Property cc-backend configuration file schema > archive

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: Configuration keys for job-archive

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ kindNoenum (of string)No-Backend type for job-archive
- pathNostringNo-Path to job archive for file backend
- compressionNointegerNo-Setup automatic compression for jobs older than number of days
- retentionNoobjectNo-Configuration keys for retention

10.1. Property cc-backend configuration file schema > archive > kind

Typeenum (of string)
RequiredYes

Description: Backend type for job-archive

Must be one of:

  • “file”
  • “s3”

10.2. Property cc-backend configuration file schema > archive > path

Typestring
RequiredNo

Description: Path to job archive for file backend

10.3. Property cc-backend configuration file schema > archive > compression

Typeinteger
RequiredNo

Description: Setup automatic compression for jobs older than number of days

10.4. Property cc-backend configuration file schema > archive > retention

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: Configuration keys for retention

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ policyNoenum (of string)No-Retention policy
- includeDBNobooleanNo-Also remove jobs from database
- ageNointegerNo-Act on jobs with startTime older than age (in days)
- locationNostringNo-The target directory for retention. Only applicable for retention move.

10.4.1. Property cc-backend configuration file schema > archive > retention > policy

Typeenum (of string)
RequiredYes

Description: Retention policy

Must be one of:

  • “none”
  • “delete”
  • “move”

10.4.2. Property cc-backend configuration file schema > archive > retention > includeDB

Typeboolean
RequiredNo

Description: Also remove jobs from database

10.4.3. Property cc-backend configuration file schema > archive > retention > age

Typeinteger
RequiredNo

Description: Act on jobs with startTime older than age (in days)

10.4.4. Property cc-backend configuration file schema > archive > retention > location

Typestring
RequiredNo

Description: The target directory for retention. Only applicable for retention move.

11. Property cc-backend configuration file schema > disable-archive

Typeboolean
RequiredNo

Description: Keep all metric data in the metric data repositories, do not write to the job-archive.

12. Property cc-backend configuration file schema > validate

Typeboolean
RequiredNo

Description: Validate all input json documents against json schema.

13. Property cc-backend configuration file schema > session-max-age

Typestring
RequiredNo

Description: Specifies for how long a session shall be valid as a string parsable by time.ParseDuration(). If 0 or empty, the session/token does not expire!

14. Property cc-backend configuration file schema > https-cert-file

Typestring
RequiredNo

Description: Filepath to SSL certificate. If also https-key-file is set use HTTPS using those certificates.

15. Property cc-backend configuration file schema > https-key-file

Typestring
RequiredNo

Description: Filepath to SSL key file. If also https-cert-file is set use HTTPS using those certificates.

16. Property cc-backend configuration file schema > redirect-http-to

Typestring
RequiredNo

Description: If not the empty string and addr does not end in :80, redirect every request incoming at port 80 to that url.

17. Property cc-backend configuration file schema > stop-jobs-exceeding-walltime

Typeinteger
RequiredNo

Description: If not zero, automatically mark jobs as stopped running X seconds longer than their walltime. Only applies if walltime is set for job.

18. Property cc-backend configuration file schema > short-running-jobs-duration

Typeinteger
RequiredNo

Description: Do not show running jobs shorter than X seconds.

19. Property cc-backend configuration file schema > emission-constant

Typeinteger
RequiredNo

Description: .

20. Property cc-backend configuration file schema > cron-frequency

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: Frequency of cron job workers.

PropertyPatternTypeDeprecatedDefinitionTitle/Description
- duration-workerNostringNo-Duration Update Worker [Defaults to ‘5m’]
- footprint-workerNostringNo-Metric-Footprint Update Worker [Defaults to ‘10m’]

20.1. Property cc-backend configuration file schema > cron-frequency > duration-worker

Typestring
RequiredNo

Description: Duration Update Worker [Defaults to ‘5m’]

20.2. Property cc-backend configuration file schema > cron-frequency > footprint-worker

Typestring
RequiredNo

Description: Metric-Footprint Update Worker [Defaults to ‘10m’]

21. Property cc-backend configuration file schema > enable-resampling

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: Enable dynamic zoom in frontend metric plots.

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ triggerNointegerNo-Trigger next zoom level at less than this many visible datapoints.
+ resolutionsNoarray of integerNo-Array of resampling target resolutions, in seconds.

21.1. Property cc-backend configuration file schema > enable-resampling > trigger

Typeinteger
RequiredYes

Description: Trigger next zoom level at less than this many visible datapoints.

21.2. Property cc-backend configuration file schema > enable-resampling > resolutions

Typearray of integer
RequiredYes

Description: Array of resampling target resolutions, in seconds.

Array restrictions
Min itemsN/A
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
resolutions items-

21.2.1. cc-backend configuration file schema > enable-resampling > resolutions > resolutions items

Typeinteger
RequiredNo

22. Property cc-backend configuration file schema > jwts

Typeobject
RequiredYes
Additional propertiesAny type allowed

Description: For JWT token authentication.

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ max-ageNostringNo-Configure how long a token is valid. As string parsable by time.ParseDuration()
- cookieNameNostringNo-Cookie that should be checked for a JWT token.
- validateUserNobooleanNo-Deny login for users not in database (but defined in JWT). Overwrite roles in JWT with database roles.
- trustedIssuerNostringNo-Issuer that should be accepted when validating external JWTs
- syncUserOnLoginNobooleanNo-Add non-existent user to DB at login attempt with values provided in JWT.

22.1. Property cc-backend configuration file schema > jwts > max-age

Typestring
RequiredYes

Description: Configure how long a token is valid. As string parsable by time.ParseDuration()

22.2. Property cc-backend configuration file schema > jwts > cookieName

Typestring
RequiredNo

Description: Cookie that should be checked for a JWT token.

22.3. Property cc-backend configuration file schema > jwts > validateUser

Typeboolean
RequiredNo

Description: Deny login for users not in database (but defined in JWT). Overwrite roles in JWT with database roles.

22.4. Property cc-backend configuration file schema > jwts > trustedIssuer

Typestring
RequiredNo

Description: Issuer that should be accepted when validating external JWTs

22.5. Property cc-backend configuration file schema > jwts > syncUserOnLogin

Typeboolean
RequiredNo

Description: Add non-existent user to DB at login attempt with values provided in JWT.

23. Property cc-backend configuration file schema > oidc

Typeobject
RequiredNo
Additional propertiesAny type allowed

23.1. The following properties are required

  • provider

24. Property cc-backend configuration file schema > ldap

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: For LDAP Authentication and user synchronisation.

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ urlNostringNo-URL of LDAP directory server.
+ user_baseNostringNo-Base DN of user tree root.
+ search_dnNostringNo-DN for authenticating LDAP admin account with general read rights.
+ user_bindNostringNo-Expression used to authenticate users via LDAP bind. Must contain uid={username}.
+ user_filterNostringNo-Filter to extract users for syncing.
- username_attrNostringNo-Attribute with full username. Default: gecos
- sync_intervalNostringNo-Interval used for syncing local user table with LDAP directory. Parsed using time.ParseDuration.
- sync_del_old_usersNobooleanNo-Delete obsolete users in database.
- syncUserOnLoginNobooleanNo-Add non-existent user to DB at login attempt if user exists in Ldap directory

24.1. Property cc-backend configuration file schema > ldap > url

Typestring
RequiredYes

Description: URL of LDAP directory server.

24.2. Property cc-backend configuration file schema > ldap > user_base

Typestring
RequiredYes

Description: Base DN of user tree root.

24.3. Property cc-backend configuration file schema > ldap > search_dn

Typestring
RequiredYes

Description: DN for authenticating LDAP admin account with general read rights.

24.4. Property cc-backend configuration file schema > ldap > user_bind

Typestring
RequiredYes

Description: Expression used to authenticate users via LDAP bind. Must contain uid={username}.

24.5. Property cc-backend configuration file schema > ldap > user_filter

Typestring
RequiredYes

Description: Filter to extract users for syncing.

24.6. Property cc-backend configuration file schema > ldap > username_attr

Typestring
RequiredNo

Description: Attribute with full username. Default: gecos

24.7. Property cc-backend configuration file schema > ldap > sync_interval

Typestring
RequiredNo

Description: Interval used for syncing local user table with LDAP directory. Parsed using time.ParseDuration.

24.8. Property cc-backend configuration file schema > ldap > sync_del_old_users

Typeboolean
RequiredNo

Description: Delete obsolete users in database.

24.9. Property cc-backend configuration file schema > ldap > syncUserOnLogin

Typeboolean
RequiredNo

Description: Add non-existent user to DB at login attempt if user exists in Ldap directory

25. Property cc-backend configuration file schema > clusters

Typearray of object
RequiredYes

Description: Configuration for the clusters to be displayed.

Array restrictions
Min itemsN/A
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
clusters items-

25.1. cc-backend configuration file schema > clusters > clusters items

Typeobject
RequiredNo
Additional propertiesAny type allowed
PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ nameNostringNo-The name of the cluster.
+ metricDataRepositoryNoobjectNo-Type of the metric data repository for this cluster
+ filterRangesNoobjectNo-This option controls the slider ranges for the UI controls of numNodes, duration, and startTime.

25.1.1. Property cc-backend configuration file schema > clusters > clusters items > name

Typestring
RequiredYes

Description: The name of the cluster.

25.1.2. Property cc-backend configuration file schema > clusters > clusters items > metricDataRepository

Typeobject
RequiredYes
Additional propertiesAny type allowed

Description: Type of the metric data repository for this cluster

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ kindNoenum (of string)No--
+ urlNostringNo--
- tokenNostringNo--
25.1.2.1. Property cc-backend configuration file schema > clusters > clusters items > metricDataRepository > kind
Typeenum (of string)
RequiredYes

Must be one of:

  • “influxdb”
  • “prometheus”
  • “cc-metric-store”
  • “test”
25.1.2.2. Property cc-backend configuration file schema > clusters > clusters items > metricDataRepository > url
Typestring
RequiredYes
25.1.2.3. Property cc-backend configuration file schema > clusters > clusters items > metricDataRepository > token
Typestring
RequiredNo

25.1.3. Property cc-backend configuration file schema > clusters > clusters items > filterRanges

Typeobject
RequiredYes
Additional propertiesAny type allowed

Description: This option controls the slider ranges for the UI controls of numNodes, duration, and startTime.

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ numNodesNoobjectNo-UI slider range for number of nodes
+ durationNoobjectNo-UI slider range for duration
+ startTimeNoobjectNo-UI slider range for start time
25.1.3.1. Property cc-backend configuration file schema > clusters > clusters items > filterRanges > numNodes
Typeobject
RequiredYes
Additional propertiesAny type allowed

Description: UI slider range for number of nodes

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ fromNointegerNo--
+ toNointegerNo--
25.1.3.1.1. Property cc-backend configuration file schema > clusters > clusters items > filterRanges > numNodes > from
Typeinteger
RequiredYes
25.1.3.1.2. Property cc-backend configuration file schema > clusters > clusters items > filterRanges > numNodes > to
Typeinteger
RequiredYes
25.1.3.2. Property cc-backend configuration file schema > clusters > clusters items > filterRanges > duration
Typeobject
RequiredYes
Additional propertiesAny type allowed

Description: UI slider range for duration

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ fromNointegerNo--
+ toNointegerNo--
25.1.3.2.1. Property cc-backend configuration file schema > clusters > clusters items > filterRanges > duration > from
Typeinteger
RequiredYes
25.1.3.2.2. Property cc-backend configuration file schema > clusters > clusters items > filterRanges > duration > to
Typeinteger
RequiredYes
25.1.3.3. Property cc-backend configuration file schema > clusters > clusters items > filterRanges > startTime
Typeobject
RequiredYes
Additional propertiesAny type allowed

Description: UI slider range for start time

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ fromNostringNo--
+ toNonullNo--
25.1.3.3.1. Property cc-backend configuration file schema > clusters > clusters items > filterRanges > startTime > from
Typestring
RequiredYes
Formatdate-time
25.1.3.3.2. Property cc-backend configuration file schema > clusters > clusters items > filterRanges > startTime > to
Typenull
RequiredYes

26. Property cc-backend configuration file schema > ui-defaults

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: Default configuration for web UI

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ plot_general_colorBackgroundNobooleanNo-Color plot background according to job average threshold limits
+ plot_general_lineWidthNointegerNo-Initial linewidth
+ plot_list_jobsPerPageNointegerNo-Jobs shown per page in job lists
+ plot_view_plotsPerRowNointegerNo-Number of plots per row in single job view
+ plot_view_showPolarplotNobooleanNo-Option to toggle polar plot in single job view
+ plot_view_showRooflineNobooleanNo-Option to toggle roofline plot in single job view
+ plot_view_showStatTableNobooleanNo-Option to toggle the node statistic table in single job view
+ system_view_selectedMetricNostringNo-Initial metric shown in system view
+ job_view_showFootprintNobooleanNo-Option to toggle footprint ui in single job view
+ job_list_usePagingNobooleanNo-Option to switch from continous scroll to paging
+ analysis_view_histogramMetricsNoarray of stringNo-Metrics to show as job count histograms in analysis view
+ analysis_view_scatterPlotMetricsNoarray of arrayNo-Initial scatter plto configuration in analysis view
+ job_view_nodestats_selectedMetricsNoarray of stringNo-Initial metrics shown in node statistics table of single job view
+ job_view_selectedMetricsNoarray of stringNo--
+ plot_general_colorschemeNoarray of stringNo-Initial color scheme
+ plot_list_selectedMetricsNoarray of stringNo-Initial metric plots shown in jobs lists

26.1. Property cc-backend configuration file schema > ui-defaults > plot_general_colorBackground

Typeboolean
RequiredYes

Description: Color plot background according to job average threshold limits

26.2. Property cc-backend configuration file schema > ui-defaults > plot_general_lineWidth

Typeinteger
RequiredYes

Description: Initial linewidth

26.3. Property cc-backend configuration file schema > ui-defaults > plot_list_jobsPerPage

Typeinteger
RequiredYes

Description: Jobs shown per page in job lists

26.4. Property cc-backend configuration file schema > ui-defaults > plot_view_plotsPerRow

Typeinteger
RequiredYes

Description: Number of plots per row in single job view

26.5. Property cc-backend configuration file schema > ui-defaults > plot_view_showPolarplot

Typeboolean
RequiredYes

Description: Option to toggle polar plot in single job view

26.6. Property cc-backend configuration file schema > ui-defaults > plot_view_showRoofline

Typeboolean
RequiredYes

Description: Option to toggle roofline plot in single job view

26.7. Property cc-backend configuration file schema > ui-defaults > plot_view_showStatTable

Typeboolean
RequiredYes

Description: Option to toggle the node statistic table in single job view

26.8. Property cc-backend configuration file schema > ui-defaults > system_view_selectedMetric

Typestring
RequiredYes

Description: Initial metric shown in system view

26.9. Property cc-backend configuration file schema > ui-defaults > job_view_showFootprint

Typeboolean
RequiredYes

Description: Option to toggle footprint ui in single job view

26.10. Property cc-backend configuration file schema > ui-defaults > job_list_usePaging

Typeboolean
RequiredYes

Description: Option to switch from continous scroll to paging

26.11. Property cc-backend configuration file schema > ui-defaults > analysis_view_histogramMetrics

Typearray of string
RequiredYes

Description: Metrics to show as job count histograms in analysis view

Array restrictions
Min itemsN/A
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
analysis_view_histogramMetrics items-

26.11.1. cc-backend configuration file schema > ui-defaults > analysis_view_histogramMetrics > analysis_view_histogramMetrics items

Typestring
RequiredNo

26.12. Property cc-backend configuration file schema > ui-defaults > analysis_view_scatterPlotMetrics

Typearray of array
RequiredYes

Description: Initial scatter plto configuration in analysis view

Array restrictions
Min itemsN/A
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
analysis_view_scatterPlotMetrics items-

26.12.1. cc-backend configuration file schema > ui-defaults > analysis_view_scatterPlotMetrics > analysis_view_scatterPlotMetrics items

Typearray of string
RequiredNo
Array restrictions
Min items1
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
analysis_view_scatterPlotMetrics items items-
26.12.1.1. cc-backend configuration file schema > ui-defaults > analysis_view_scatterPlotMetrics > analysis_view_scatterPlotMetrics items > analysis_view_scatterPlotMetrics items items
Typestring
RequiredNo

26.13. Property cc-backend configuration file schema > ui-defaults > job_view_nodestats_selectedMetrics

Typearray of string
RequiredYes

Description: Initial metrics shown in node statistics table of single job view

Array restrictions
Min itemsN/A
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
job_view_nodestats_selectedMetrics items-

26.13.1. cc-backend configuration file schema > ui-defaults > job_view_nodestats_selectedMetrics > job_view_nodestats_selectedMetrics items

Typestring
RequiredNo

26.14. Property cc-backend configuration file schema > ui-defaults > job_view_selectedMetrics

Typearray of string
RequiredYes
Array restrictions
Min itemsN/A
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
job_view_selectedMetrics items-

26.14.1. cc-backend configuration file schema > ui-defaults > job_view_selectedMetrics > job_view_selectedMetrics items

Typestring
RequiredNo

26.15. Property cc-backend configuration file schema > ui-defaults > plot_general_colorscheme

Typearray of string
RequiredYes

Description: Initial color scheme

Array restrictions
Min itemsN/A
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
plot_general_colorscheme items-

26.15.1. cc-backend configuration file schema > ui-defaults > plot_general_colorscheme > plot_general_colorscheme items

Typestring
RequiredNo

26.16. Property cc-backend configuration file schema > ui-defaults > plot_list_selectedMetrics

Typearray of string
RequiredYes

Description: Initial metric plots shown in jobs lists

Array restrictions
Min itemsN/A
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
plot_list_selectedMetrics items-

26.16.1. cc-backend configuration file schema > ui-defaults > plot_list_selectedMetrics > plot_list_selectedMetrics items

Typestring
RequiredNo

Generated using json-schema-for-humans on 2024-12-04 at 16:45:59 +0100

1.7.2 - Cluster Schema

ClusterCockpit Cluster Schema Reference

The following schema in its raw form can be found in the ClusterCockpit GitHub repository.

HPC cluster description

Title: HPC cluster description

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: Meta data information of a HPC cluster

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ nameNostringNo-The unique identifier of a cluster
+ metricConfigNoarray of objectNo-Metric specifications
+ subClustersNoarray of objectNo-Array of cluster hardware partitions

1. Property HPC cluster description > name

Typestring
RequiredYes

Description: The unique identifier of a cluster

2. Property HPC cluster description > metricConfig

Typearray of object
RequiredYes

Description: Metric specifications

Array restrictions
Min items1
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
metricConfig items-

2.1. HPC cluster description > metricConfig > metricConfig items

Typeobject
RequiredNo
Additional propertiesAny type allowed
PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ nameNostringNo-Metric name
+ unitNoobjectNoIn embedfs://unit.schema.jsonMetric unit
+ scopeNostringNo-Native measurement resolution
+ timestepNointegerNo-Frequency of timeseries points
+ aggregationNoenum (of string)No-How the metric is aggregated
- footprintNoenum (of string)No-Is it a footprint metric and what type
- energyNoenum (of string)No-Is it used to calculate job energy
- lowerIsBetterNobooleanNo-Is lower better.
+ peakNonumberNo-Metric peak threshold (Upper metric limit)
+ normalNonumberNo-Metric normal threshold
+ cautionNonumberNo-Metric caution threshold (Suspicious but does not require immediate action)
+ alertNonumberNo-Metric alert threshold (Requires immediate action)
- subClustersNoarray of objectNo-Array of cluster hardware partition metric thresholds

2.1.1. Property HPC cluster description > metricConfig > metricConfig items > name

Typestring
RequiredYes

Description: Metric name

2.1.2. Property HPC cluster description > metricConfig > metricConfig items > unit

Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://unit.schema.json

Description: Metric unit

2.1.3. Property HPC cluster description > metricConfig > metricConfig items > scope

Typestring
RequiredYes

Description: Native measurement resolution

2.1.4. Property HPC cluster description > metricConfig > metricConfig items > timestep

Typeinteger
RequiredYes

Description: Frequency of timeseries points

2.1.5. Property HPC cluster description > metricConfig > metricConfig items > aggregation

Typeenum (of string)
RequiredYes

Description: How the metric is aggregated

Must be one of:

  • “sum”
  • “avg”

2.1.6. Property HPC cluster description > metricConfig > metricConfig items > footprint

Typeenum (of string)
RequiredNo

Description: Is it a footprint metric and what type

Must be one of:

  • “avg”
  • “max”
  • “min”

2.1.7. Property HPC cluster description > metricConfig > metricConfig items > energy

Typeenum (of string)
RequiredNo

Description: Is it used to calculate job energy

Must be one of:

  • “power”
  • “energy”

2.1.8. Property HPC cluster description > metricConfig > metricConfig items > lowerIsBetter

Typeboolean
RequiredNo

Description: Is lower better.

2.1.9. Property HPC cluster description > metricConfig > metricConfig items > peak

Typenumber
RequiredYes

Description: Metric peak threshold (Upper metric limit)

2.1.10. Property HPC cluster description > metricConfig > metricConfig items > normal

Typenumber
RequiredYes

Description: Metric normal threshold

2.1.11. Property HPC cluster description > metricConfig > metricConfig items > caution

Typenumber
RequiredYes

Description: Metric caution threshold (Suspicious but does not require immediate action)

2.1.12. Property HPC cluster description > metricConfig > metricConfig items > alert

Typenumber
RequiredYes

Description: Metric alert threshold (Requires immediate action)

2.1.13. Property HPC cluster description > metricConfig > metricConfig items > subClusters

Typearray of object
RequiredNo

Description: Array of cluster hardware partition metric thresholds

Array restrictions
Min itemsN/A
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
subClusters items-
2.1.13.1. HPC cluster description > metricConfig > metricConfig items > subClusters > subClusters items
Typeobject
RequiredNo
Additional propertiesAny type allowed
PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ nameNostringNo-Hardware partition name
- footprintNoenum (of string)No-Is it a footprint metric and what type. Overwrite global setting
- energyNoenum (of string)No-Is it used to calculate job energy. Overwrite global
- lowerIsBetterNobooleanNo-Is lower better. Overwrite global
- peakNonumberNo--
- normalNonumberNo--
- cautionNonumberNo--
- alertNonumberNo--
- removeNobooleanNo-Remove this metric for this subcluster
2.1.13.1.1. Property HPC cluster description > metricConfig > metricConfig items > subClusters > subClusters items > name
Typestring
RequiredYes

Description: Hardware partition name

2.1.13.1.2. Property HPC cluster description > metricConfig > metricConfig items > subClusters > subClusters items > footprint
Typeenum (of string)
RequiredNo

Description: Is it a footprint metric and what type. Overwrite global setting

Must be one of:

  • “avg”
  • “max”
  • “min”
2.1.13.1.3. Property HPC cluster description > metricConfig > metricConfig items > subClusters > subClusters items > energy
Typeenum (of string)
RequiredNo

Description: Is it used to calculate job energy. Overwrite global

Must be one of:

  • “power”
  • “energy”
2.1.13.1.4. Property HPC cluster description > metricConfig > metricConfig items > subClusters > subClusters items > lowerIsBetter
Typeboolean
RequiredNo

Description: Is lower better. Overwrite global

2.1.13.1.5. Property HPC cluster description > metricConfig > metricConfig items > subClusters > subClusters items > peak
Typenumber
RequiredNo
2.1.13.1.6. Property HPC cluster description > metricConfig > metricConfig items > subClusters > subClusters items > normal
Typenumber
RequiredNo
2.1.13.1.7. Property HPC cluster description > metricConfig > metricConfig items > subClusters > subClusters items > caution
Typenumber
RequiredNo
2.1.13.1.8. Property HPC cluster description > metricConfig > metricConfig items > subClusters > subClusters items > alert
Typenumber
RequiredNo
2.1.13.1.9. Property HPC cluster description > metricConfig > metricConfig items > subClusters > subClusters items > remove
Typeboolean
RequiredNo

Description: Remove this metric for this subcluster

3. Property HPC cluster description > subClusters

Typearray of object
RequiredYes

Description: Array of cluster hardware partitions

Array restrictions
Min items1
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
subClusters items-

3.1. HPC cluster description > subClusters > subClusters items

Typeobject
RequiredNo
Additional propertiesAny type allowed
PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ nameNostringNo-Hardware partition name
+ processorTypeNostringNo-Processor type
+ socketsPerNodeNointegerNo-Number of sockets per node
+ coresPerSocketNointegerNo-Number of cores per socket
+ threadsPerCoreNointegerNo-Number of SMT threads per core
+ flopRateScalarNoobjectNo-Theoretical node peak flop rate for scalar code in GFlops/s
+ flopRateSimdNoobjectNo-Theoretical node peak flop rate for SIMD code in GFlops/s
+ memoryBandwidthNoobjectNo-Theoretical node peak memory bandwidth in GB/s
+ nodesNostringNo-Node list expression
+ topologyNoobjectNo-Node topology

3.1.1. Property HPC cluster description > subClusters > subClusters items > name

Typestring
RequiredYes

Description: Hardware partition name

3.1.2. Property HPC cluster description > subClusters > subClusters items > processorType

Typestring
RequiredYes

Description: Processor type

3.1.3. Property HPC cluster description > subClusters > subClusters items > socketsPerNode

Typeinteger
RequiredYes

Description: Number of sockets per node

3.1.4. Property HPC cluster description > subClusters > subClusters items > coresPerSocket

Typeinteger
RequiredYes

Description: Number of cores per socket

3.1.5. Property HPC cluster description > subClusters > subClusters items > threadsPerCore

Typeinteger
RequiredYes

Description: Number of SMT threads per core

3.1.6. Property HPC cluster description > subClusters > subClusters items > flopRateScalar

Typeobject
RequiredYes
Additional propertiesAny type allowed

Description: Theoretical node peak flop rate for scalar code in GFlops/s

PropertyPatternTypeDeprecatedDefinitionTitle/Description
- unitNoobjectNoIn embedfs://unit.schema.jsonMetric unit
- valueNonumberNo--
3.1.6.1. Property HPC cluster description > subClusters > subClusters items > flopRateScalar > unit
Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://unit.schema.json

Description: Metric unit

3.1.6.2. Property HPC cluster description > subClusters > subClusters items > flopRateScalar > value
Typenumber
RequiredNo

3.1.7. Property HPC cluster description > subClusters > subClusters items > flopRateSimd

Typeobject
RequiredYes
Additional propertiesAny type allowed

Description: Theoretical node peak flop rate for SIMD code in GFlops/s

PropertyPatternTypeDeprecatedDefinitionTitle/Description
- unitNoobjectNoIn embedfs://unit.schema.jsonMetric unit
- valueNonumberNo--
3.1.7.1. Property HPC cluster description > subClusters > subClusters items > flopRateSimd > unit
Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://unit.schema.json

Description: Metric unit

3.1.7.2. Property HPC cluster description > subClusters > subClusters items > flopRateSimd > value
Typenumber
RequiredNo

3.1.8. Property HPC cluster description > subClusters > subClusters items > memoryBandwidth

Typeobject
RequiredYes
Additional propertiesAny type allowed

Description: Theoretical node peak memory bandwidth in GB/s

PropertyPatternTypeDeprecatedDefinitionTitle/Description
- unitNoobjectNoIn embedfs://unit.schema.jsonMetric unit
- valueNonumberNo--
3.1.8.1. Property HPC cluster description > subClusters > subClusters items > memoryBandwidth > unit
Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://unit.schema.json

Description: Metric unit

3.1.8.2. Property HPC cluster description > subClusters > subClusters items > memoryBandwidth > value
Typenumber
RequiredNo

3.1.9. Property HPC cluster description > subClusters > subClusters items > nodes

Typestring
RequiredYes

Description: Node list expression

3.1.10. Property HPC cluster description > subClusters > subClusters items > topology

Typeobject
RequiredYes
Additional propertiesAny type allowed

Description: Node topology

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ nodeNoarray of integerNo-HwTread lists of node
+ socketNoarray of arrayNo-HwTread lists of sockets
+ memoryDomainNoarray of arrayNo-HwTread lists of memory domains
- dieNoarray of arrayNo-HwTread lists of dies
- coreNoarray of arrayNo-HwTread lists of cores
- acceleratorsNoarray of objectNo-List of of accelerator devices
3.1.10.1. Property HPC cluster description > subClusters > subClusters items > topology > node
Typearray of integer
RequiredYes

Description: HwTread lists of node

Array restrictions
Min itemsN/A
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
node items-
3.1.10.1.1. HPC cluster description > subClusters > subClusters items > topology > node > node items
Typeinteger
RequiredNo
3.1.10.2. Property HPC cluster description > subClusters > subClusters items > topology > socket
Typearray of array
RequiredYes

Description: HwTread lists of sockets

Array restrictions
Min itemsN/A
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
socket items-
3.1.10.2.1. HPC cluster description > subClusters > subClusters items > topology > socket > socket items
Typearray of integer
RequiredNo
Array restrictions
Min itemsN/A
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
socket items items-
3.1.10.2.1.1. HPC cluster description > subClusters > subClusters items > topology > socket > socket items > socket items items
Typeinteger
RequiredNo
3.1.10.3. Property HPC cluster description > subClusters > subClusters items > topology > memoryDomain
Typearray of array
RequiredYes

Description: HwTread lists of memory domains

Array restrictions
Min itemsN/A
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
memoryDomain items-
3.1.10.3.1. HPC cluster description > subClusters > subClusters items > topology > memoryDomain > memoryDomain items
Typearray of integer
RequiredNo
Array restrictions
Min itemsN/A
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
memoryDomain items items-
3.1.10.3.1.1. HPC cluster description > subClusters > subClusters items > topology > memoryDomain > memoryDomain items > memoryDomain items items
Typeinteger
RequiredNo
3.1.10.4. Property HPC cluster description > subClusters > subClusters items > topology > die
Typearray of array
RequiredNo

Description: HwTread lists of dies

Array restrictions
Min itemsN/A
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
die items-
3.1.10.4.1. HPC cluster description > subClusters > subClusters items > topology > die > die items
Typearray of integer
RequiredNo
Array restrictions
Min itemsN/A
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
die items items-
3.1.10.4.1.1. HPC cluster description > subClusters > subClusters items > topology > die > die items > die items items
Typeinteger
RequiredNo
3.1.10.5. Property HPC cluster description > subClusters > subClusters items > topology > core
Typearray of array
RequiredNo

Description: HwTread lists of cores

Array restrictions
Min itemsN/A
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
core items-
3.1.10.5.1. HPC cluster description > subClusters > subClusters items > topology > core > core items
Typearray of integer
RequiredNo
Array restrictions
Min itemsN/A
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
core items items-
3.1.10.5.1.1. HPC cluster description > subClusters > subClusters items > topology > core > core items > core items items
Typeinteger
RequiredNo
3.1.10.6. Property HPC cluster description > subClusters > subClusters items > topology > accelerators
Typearray of object
RequiredNo

Description: List of of accelerator devices

Array restrictions
Min itemsN/A
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
accelerators items-
3.1.10.6.1. HPC cluster description > subClusters > subClusters items > topology > accelerators > accelerators items
Typeobject
RequiredNo
Additional propertiesAny type allowed
PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ idNostringNo-The unique device id
+ typeNoenum (of string)No-The accelerator type
+ modelNostringNo-The accelerator model
3.1.10.6.1.1. Property HPC cluster description > subClusters > subClusters items > topology > accelerators > accelerators items > id
Typestring
RequiredYes

Description: The unique device id

3.1.10.6.1.2. Property HPC cluster description > subClusters > subClusters items > topology > accelerators > accelerators items > type
Typeenum (of string)
RequiredYes

Description: The accelerator type

Must be one of:

  • “Nvidia GPU”
  • “AMD GPU”
  • “Intel GPU”
3.1.10.6.1.3. Property HPC cluster description > subClusters > subClusters items > topology > accelerators > accelerators items > model
Typestring
RequiredYes

Description: The accelerator model


Generated using json-schema-for-humans on 2024-12-04 at 16:45:59 +0100

1.7.3 - Job Data Schema

ClusterCockpit Job Data Schema Reference

The following schema in its raw form can be found in the ClusterCockpit GitHub repository.

Job metric data list

Title: Job metric data list

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: Collection of metric data of a HPC job

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ mem_usedNoobjectNo-Memory capacity used
+ flops_anyNoobjectNo-Total flop rate with DP flops scaled up
+ mem_bwNoobjectNo-Main memory bandwidth
+ net_bwNoobjectNo-Total fast interconnect network bandwidth
- ipcNoobjectNo-Instructions executed per cycle
+ cpu_userNoobjectNo-CPU user active core utilization
+ cpu_loadNoobjectNo-CPU requested core utilization (load 1m)
- flops_dpNoobjectNo-Double precision flop rate
- flops_spNoobjectNo-Single precision flops rate
- vectorization_ratioNoobjectNo-Fraction of arithmetic instructions using SIMD instructions
- cpu_powerNoobjectNo-CPU power consumption
- mem_powerNoobjectNo-Memory power consumption
- acc_utilizationNoobjectNo-GPU utilization
- acc_mem_usedNoobjectNo-GPU memory capacity used
- acc_powerNoobjectNo-GPU power consumption
- clockNoobjectNo-Average core frequency
- eth_read_bwNoobjectNo-Ethernet read bandwidth
- eth_write_bwNoobjectNo-Ethernet write bandwidth
+ filesystemsNoarray of objectNo-Array of filesystems

1. Property Job metric data list > mem_used

Typeobject
RequiredYes
Additional propertiesAny type allowed

Description: Memory capacity used

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ nodeNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

1.1. Property Job metric data list > mem_used > node

Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

2. Property Job metric data list > flops_any

Typeobject
RequiredYes
Additional propertiesAny type allowed

Description: Total flop rate with DP flops scaled up

PropertyPatternTypeDeprecatedDefinitionTitle/Description
- nodeNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- socketNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- memoryDomainNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- coreNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- hwthreadNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

2.1. Property Job metric data list > flops_any > node

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

2.2. Property Job metric data list > flops_any > socket

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

2.3. Property Job metric data list > flops_any > memoryDomain

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

2.4. Property Job metric data list > flops_any > core

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

2.5. Property Job metric data list > flops_any > hwthread

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

3. Property Job metric data list > mem_bw

Typeobject
RequiredYes
Additional propertiesAny type allowed

Description: Main memory bandwidth

PropertyPatternTypeDeprecatedDefinitionTitle/Description
- nodeNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- socketNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- memoryDomainNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

3.1. Property Job metric data list > mem_bw > node

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

3.2. Property Job metric data list > mem_bw > socket

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

3.3. Property Job metric data list > mem_bw > memoryDomain

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

4. Property Job metric data list > net_bw

Typeobject
RequiredYes
Additional propertiesAny type allowed

Description: Total fast interconnect network bandwidth

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ nodeNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

4.1. Property Job metric data list > net_bw > node

Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

5. Property Job metric data list > ipc

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: Instructions executed per cycle

PropertyPatternTypeDeprecatedDefinitionTitle/Description
- nodeNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- socketNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- memoryDomainNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- coreNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- hwthreadNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

5.1. Property Job metric data list > ipc > node

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

5.2. Property Job metric data list > ipc > socket

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

5.3. Property Job metric data list > ipc > memoryDomain

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

5.4. Property Job metric data list > ipc > core

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

5.5. Property Job metric data list > ipc > hwthread

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

6. Property Job metric data list > cpu_user

Typeobject
RequiredYes
Additional propertiesAny type allowed

Description: CPU user active core utilization

PropertyPatternTypeDeprecatedDefinitionTitle/Description
- nodeNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- socketNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- memoryDomainNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- coreNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- hwthreadNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

6.1. Property Job metric data list > cpu_user > node

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

6.2. Property Job metric data list > cpu_user > socket

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

6.3. Property Job metric data list > cpu_user > memoryDomain

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

6.4. Property Job metric data list > cpu_user > core

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

6.5. Property Job metric data list > cpu_user > hwthread

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

7. Property Job metric data list > cpu_load

Typeobject
RequiredYes
Additional propertiesAny type allowed

Description: CPU requested core utilization (load 1m)

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ nodeNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

7.1. Property Job metric data list > cpu_load > node

Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

8. Property Job metric data list > flops_dp

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: Double precision flop rate

PropertyPatternTypeDeprecatedDefinitionTitle/Description
- nodeNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- socketNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- memoryDomainNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- coreNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- hwthreadNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

8.1. Property Job metric data list > flops_dp > node

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

8.2. Property Job metric data list > flops_dp > socket

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

8.3. Property Job metric data list > flops_dp > memoryDomain

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

8.4. Property Job metric data list > flops_dp > core

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

8.5. Property Job metric data list > flops_dp > hwthread

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

9. Property Job metric data list > flops_sp

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: Single precision flops rate

PropertyPatternTypeDeprecatedDefinitionTitle/Description
- nodeNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- socketNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- memoryDomainNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- coreNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- hwthreadNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

9.1. Property Job metric data list > flops_sp > node

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

9.2. Property Job metric data list > flops_sp > socket

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

9.3. Property Job metric data list > flops_sp > memoryDomain

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

9.4. Property Job metric data list > flops_sp > core

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

9.5. Property Job metric data list > flops_sp > hwthread

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

10. Property Job metric data list > vectorization_ratio

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: Fraction of arithmetic instructions using SIMD instructions

PropertyPatternTypeDeprecatedDefinitionTitle/Description
- nodeNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- socketNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- memoryDomainNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- coreNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- hwthreadNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

10.1. Property Job metric data list > vectorization_ratio > node

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

10.2. Property Job metric data list > vectorization_ratio > socket

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

10.3. Property Job metric data list > vectorization_ratio > memoryDomain

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

10.4. Property Job metric data list > vectorization_ratio > core

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

10.5. Property Job metric data list > vectorization_ratio > hwthread

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

11. Property Job metric data list > cpu_power

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: CPU power consumption

PropertyPatternTypeDeprecatedDefinitionTitle/Description
- nodeNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- socketNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

11.1. Property Job metric data list > cpu_power > node

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

11.2. Property Job metric data list > cpu_power > socket

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

12. Property Job metric data list > mem_power

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: Memory power consumption

PropertyPatternTypeDeprecatedDefinitionTitle/Description
- nodeNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- socketNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

12.1. Property Job metric data list > mem_power > node

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

12.2. Property Job metric data list > mem_power > socket

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

13. Property Job metric data list > acc_utilization

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: GPU utilization

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ acceleratorNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

13.1. Property Job metric data list > acc_utilization > accelerator

Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

14. Property Job metric data list > acc_mem_used

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: GPU memory capacity used

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ acceleratorNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

14.1. Property Job metric data list > acc_mem_used > accelerator

Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

15. Property Job metric data list > acc_power

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: GPU power consumption

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ acceleratorNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

15.1. Property Job metric data list > acc_power > accelerator

Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

16. Property Job metric data list > clock

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: Average core frequency

PropertyPatternTypeDeprecatedDefinitionTitle/Description
- nodeNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- socketNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- memoryDomainNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- coreNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
- hwthreadNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

16.1. Property Job metric data list > clock > node

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

16.2. Property Job metric data list > clock > socket

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

16.3. Property Job metric data list > clock > memoryDomain

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

16.4. Property Job metric data list > clock > core

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

16.5. Property Job metric data list > clock > hwthread

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

17. Property Job metric data list > eth_read_bw

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: Ethernet read bandwidth

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ nodeNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

17.1. Property Job metric data list > eth_read_bw > node

Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

18. Property Job metric data list > eth_write_bw

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: Ethernet write bandwidth

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ nodeNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

18.1. Property Job metric data list > eth_write_bw > node

Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

19. Property Job metric data list > filesystems

Typearray of object
RequiredYes

Description: Array of filesystems

Array restrictions
Min items1
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
filesystems items-

19.1. Job metric data list > filesystems > filesystems items

Typeobject
RequiredNo
Additional propertiesAny type allowed
PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ nameNostringNo--
+ typeNoenum (of string)No--
+ read_bwNoobjectNo-File system read bandwidth
+ write_bwNoobjectNo-File system write bandwidth
- read_reqNoobjectNo-File system read requests
- write_reqNoobjectNo-File system write requests
- inodesNoobjectNo-File system write requests
- accessesNoobjectNo-File system open and close
- fsyncNoobjectNo-File system fsync
- createNoobjectNo-File system create
- openNoobjectNo-File system open
- closeNoobjectNo-File system close
- seekNoobjectNo-File system seek

19.1.1. Property Job metric data list > filesystems > filesystems items > name

Typestring
RequiredYes

19.1.2. Property Job metric data list > filesystems > filesystems items > type

Typeenum (of string)
RequiredYes

Must be one of:

  • “nfs”
  • “lustre”
  • “gpfs”
  • “nvme”
  • “ssd”
  • “hdd”
  • “beegfs”

19.1.3. Property Job metric data list > filesystems > filesystems items > read_bw

Typeobject
RequiredYes
Additional propertiesAny type allowed

Description: File system read bandwidth

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ nodeNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
19.1.3.1. Property Job metric data list > filesystems > filesystems items > read_bw > node
Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

19.1.4. Property Job metric data list > filesystems > filesystems items > write_bw

Typeobject
RequiredYes
Additional propertiesAny type allowed

Description: File system write bandwidth

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ nodeNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
19.1.4.1. Property Job metric data list > filesystems > filesystems items > write_bw > node
Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

19.1.5. Property Job metric data list > filesystems > filesystems items > read_req

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: File system read requests

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ nodeNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
19.1.5.1. Property Job metric data list > filesystems > filesystems items > read_req > node
Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

19.1.6. Property Job metric data list > filesystems > filesystems items > write_req

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: File system write requests

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ nodeNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
19.1.6.1. Property Job metric data list > filesystems > filesystems items > write_req > node
Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

19.1.7. Property Job metric data list > filesystems > filesystems items > inodes

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: File system write requests

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ nodeNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
19.1.7.1. Property Job metric data list > filesystems > filesystems items > inodes > node
Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

19.1.8. Property Job metric data list > filesystems > filesystems items > accesses

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: File system open and close

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ nodeNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
19.1.8.1. Property Job metric data list > filesystems > filesystems items > accesses > node
Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

19.1.9. Property Job metric data list > filesystems > filesystems items > fsync

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: File system fsync

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ nodeNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
19.1.9.1. Property Job metric data list > filesystems > filesystems items > fsync > node
Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

19.1.10. Property Job metric data list > filesystems > filesystems items > create

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: File system create

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ nodeNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
19.1.10.1. Property Job metric data list > filesystems > filesystems items > create > node
Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

19.1.11. Property Job metric data list > filesystems > filesystems items > open

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: File system open

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ nodeNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
19.1.11.1. Property Job metric data list > filesystems > filesystems items > open > node
Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

19.1.12. Property Job metric data list > filesystems > filesystems items > close

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: File system close

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ nodeNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
19.1.12.1. Property Job metric data list > filesystems > filesystems items > close > node
Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️

19.1.13. Property Job metric data list > filesystems > filesystems items > seek

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: File system seek

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ nodeNoobjectNoIn embedfs://job-metric-data.schema.json😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️
19.1.13.1. Property Job metric data list > filesystems > filesystems items > seek > node
Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://job-metric-data.schema.json

Description: 😅 ERROR in schema generation, a referenced schema could not be loaded, no documentation here unfortunately 🏜️


Generated using json-schema-for-humans on 2024-12-04 at 16:45:59 +0100

1.7.4 - Job Statistics Schema

ClusterCockpit Job Statistics Schema Reference

The following schema in its raw form can be found in the ClusterCockpit GitHub repository.

Job statistics

Title: Job statistics

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: Format specification for job metric statistics

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ unitNoobjectNoIn embedfs://unit.schema.jsonMetric unit
+ avgNonumberNo-Job metric average
+ minNonumberNo-Job metric minimum
+ maxNonumberNo-Job metric maximum

1. Property Job statistics > unit

Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://unit.schema.json

Description: Metric unit

2. Property Job statistics > avg

Typenumber
RequiredYes

Description: Job metric average

Restrictions
Minimum≥ 0

3. Property Job statistics > min

Typenumber
RequiredYes

Description: Job metric minimum

Restrictions
Minimum≥ 0

4. Property Job statistics > max

Typenumber
RequiredYes

Description: Job metric maximum

Restrictions
Minimum≥ 0

Generated using json-schema-for-humans on 2024-12-04 at 16:45:59 +0100

1.7.5 - Unit Schema

ClusterCockpit Unit Schema Reference

The following schema in its raw form can be found in the ClusterCockpit GitHub repository.

Metric unit

Title: Metric unit

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: Format specification for job metric units

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ baseNoenum (of string)No-Metric base unit
- prefixNoenum (of string)No-Unit prefix

1. Property Metric unit > base

Typeenum (of string)
RequiredYes

Description: Metric base unit

Must be one of:

  • “B”
  • “F”
  • “B/s”
  • “F/s”
  • “CPI”
  • “IPC”
  • “Hz”
  • “W”
  • “°C”
  • ""

2. Property Metric unit > prefix

Typeenum (of string)
RequiredNo

Description: Unit prefix

Must be one of:

  • “K”
  • “M”
  • “G”
  • “T”
  • “P”
  • “E”

Generated using json-schema-for-humans on 2024-12-04 at 16:45:59 +0100

1.7.6 - Job Archive Metadata Schema

ClusterCockpit Job Archive Metadata Schema Reference

The following schema in its raw form can be found in the ClusterCockpit GitHub repository.

Job meta data

Title: Job meta data

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: Meta data information of a HPC job

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ jobIdNointegerNo-The unique identifier of a job
+ userNostringNo-The unique identifier of a user
+ projectNostringNo-The unique identifier of a project
+ clusterNostringNo-The unique identifier of a cluster
+ subClusterNostringNo-The unique identifier of a sub cluster
- partitionNostringNo-The Slurm partition to which the job was submitted
- arrayJobIdNointegerNo-The unique identifier of an array job
+ numNodesNointegerNo-Number of nodes used
- numHwthreadsNointegerNo-Number of HWThreads used
- numAccNointegerNo-Number of accelerators used
+ exclusiveNointegerNo-Specifies how nodes are shared. 0 - Shared among multiple jobs of multiple users, 1 - Job exclusive, 2 - Shared among multiple jobs of same user
- monitoringStatusNointegerNo-State of monitoring system during job run
- smtNointegerNo-SMT threads used by job
- walltimeNointegerNo-Requested walltime of job in seconds
+ jobStateNoenum (of string)No-Final state of job
+ startTimeNointegerNo-Start epoch time stamp in seconds
+ durationNointegerNo-Duration of job in seconds
+ resourcesNoarray of objectNo-Resources used by job
- metaDataNoobjectNo-Additional information about the job
- tagsNoarray of objectNo-List of tags
+ statisticsNoobjectNo-Job statistic data

1. Property Job meta data > jobId

Typeinteger
RequiredYes

Description: The unique identifier of a job

2. Property Job meta data > user

Typestring
RequiredYes

Description: The unique identifier of a user

3. Property Job meta data > project

Typestring
RequiredYes

Description: The unique identifier of a project

4. Property Job meta data > cluster

Typestring
RequiredYes

Description: The unique identifier of a cluster

5. Property Job meta data > subCluster

Typestring
RequiredYes

Description: The unique identifier of a sub cluster

6. Property Job meta data > partition

Typestring
RequiredNo

Description: The Slurm partition to which the job was submitted

7. Property Job meta data > arrayJobId

Typeinteger
RequiredNo

Description: The unique identifier of an array job

8. Property Job meta data > numNodes

Typeinteger
RequiredYes

Description: Number of nodes used

Restrictions
Minimum> 0

9. Property Job meta data > numHwthreads

Typeinteger
RequiredNo

Description: Number of HWThreads used

Restrictions
Minimum> 0

10. Property Job meta data > numAcc

Typeinteger
RequiredNo

Description: Number of accelerators used

Restrictions
Minimum> 0

11. Property Job meta data > exclusive

Typeinteger
RequiredYes

Description: Specifies how nodes are shared. 0 - Shared among multiple jobs of multiple users, 1 - Job exclusive, 2 - Shared among multiple jobs of same user

Restrictions
Minimum≥ 0
Maximum≤ 2

12. Property Job meta data > monitoringStatus

Typeinteger
RequiredNo

Description: State of monitoring system during job run

13. Property Job meta data > smt

Typeinteger
RequiredNo

Description: SMT threads used by job

14. Property Job meta data > walltime

Typeinteger
RequiredNo

Description: Requested walltime of job in seconds

Restrictions
Minimum> 0

15. Property Job meta data > jobState

Typeenum (of string)
RequiredYes

Description: Final state of job

Must be one of:

  • “completed”
  • “failed”
  • “cancelled”
  • “stopped”
  • “out_of_memory”
  • “timeout”

16. Property Job meta data > startTime

Typeinteger
RequiredYes

Description: Start epoch time stamp in seconds

Restrictions
Minimum> 0

17. Property Job meta data > duration

Typeinteger
RequiredYes

Description: Duration of job in seconds

Restrictions
Minimum> 0

18. Property Job meta data > resources

Typearray of object
RequiredYes

Description: Resources used by job

Array restrictions
Min itemsN/A
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
resources items-

18.1. Job meta data > resources > resources items

Typeobject
RequiredNo
Additional propertiesAny type allowed
PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ hostnameNostringNo--
- hwthreadsNoarray of integerNo-List of OS processor ids
- acceleratorsNoarray of stringNo-List of of accelerator device ids
- configurationNostringNo-The configuration options of the node

18.1.1. Property Job meta data > resources > resources items > hostname

Typestring
RequiredYes

18.1.2. Property Job meta data > resources > resources items > hwthreads

Typearray of integer
RequiredNo

Description: List of OS processor ids

Array restrictions
Min itemsN/A
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
hwthreads items-
18.1.2.1. Job meta data > resources > resources items > hwthreads > hwthreads items
Typeinteger
RequiredNo

18.1.3. Property Job meta data > resources > resources items > accelerators

Typearray of string
RequiredNo

Description: List of of accelerator device ids

Array restrictions
Min itemsN/A
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
accelerators items-
18.1.3.1. Job meta data > resources > resources items > accelerators > accelerators items
Typestring
RequiredNo

18.1.4. Property Job meta data > resources > resources items > configuration

Typestring
RequiredNo

Description: The configuration options of the node

19. Property Job meta data > metaData

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: Additional information about the job

PropertyPatternTypeDeprecatedDefinitionTitle/Description
- jobScriptNostringNo-The batch script of the job
- jobNameNostringNo-Slurm Job name
- slurmInfoNostringNo-Additional slurm infos as show by scontrol show job

19.1. Property Job meta data > metaData > jobScript

Typestring
RequiredNo

Description: The batch script of the job

19.2. Property Job meta data > metaData > jobName

Typestring
RequiredNo

Description: Slurm Job name

19.3. Property Job meta data > metaData > slurmInfo

Typestring
RequiredNo

Description: Additional slurm infos as show by scontrol show job

20. Property Job meta data > tags

Typearray of object
RequiredNo

Description: List of tags

Array restrictions
Min itemsN/A
Max itemsN/A
Items unicityTrue
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
tags items-

20.1. Job meta data > tags > tags items

Typeobject
RequiredNo
Additional propertiesAny type allowed
PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ nameNostringNo--
+ typeNostringNo--

20.1.1. Property Job meta data > tags > tags items > name

Typestring
RequiredYes

20.1.2. Property Job meta data > tags > tags items > type

Typestring
RequiredYes

21. Property Job meta data > statistics

Typeobject
RequiredYes
Additional propertiesAny type allowed

Description: Job statistic data

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ mem_usedNoobjectNoIn embedfs://job-metric-statistics.schema.jsonMemory capacity used (required)
+ cpu_loadNoobjectNoIn embedfs://job-metric-statistics.schema.jsonCPU requested core utilization (load 1m) (required)
+ flops_anyNoobjectNoIn embedfs://job-metric-statistics.schema.jsonTotal flop rate with DP flops scaled up (required)
+ mem_bwNoobjectNoIn embedfs://job-metric-statistics.schema.jsonMain memory bandwidth (required)
- net_bwNoobjectNoIn embedfs://job-metric-statistics.schema.jsonTotal fast interconnect network bandwidth (required)
- file_bwNoobjectNoIn embedfs://job-metric-statistics.schema.jsonTotal file IO bandwidth (required)
- ipcNoobjectNoIn embedfs://job-metric-statistics.schema.jsonInstructions executed per cycle
+ cpu_userNoobjectNoIn embedfs://job-metric-statistics.schema.jsonCPU user active core utilization
- flops_dpNoobjectNoIn embedfs://job-metric-statistics.schema.jsonDouble precision flop rate
- flops_spNoobjectNoIn embedfs://job-metric-statistics.schema.jsonSingle precision flops rate
- rapl_powerNoobjectNoIn embedfs://job-metric-statistics.schema.jsonCPU power consumption
- acc_usedNoobjectNoIn embedfs://job-metric-statistics.schema.jsonGPU utilization
- acc_mem_usedNoobjectNoIn embedfs://job-metric-statistics.schema.jsonGPU memory capacity used
- acc_powerNoobjectNoIn embedfs://job-metric-statistics.schema.jsonGPU power consumption
- clockNoobjectNoIn embedfs://job-metric-statistics.schema.jsonAverage core frequency
- eth_read_bwNoobjectNoIn embedfs://job-metric-statistics.schema.jsonEthernet read bandwidth
- eth_write_bwNoobjectNoIn embedfs://job-metric-statistics.schema.jsonEthernet write bandwidth
- ic_rcv_packetsNoobjectNoIn embedfs://job-metric-statistics.schema.jsonNetwork interconnect read packets
- ic_send_packetsNoobjectNoIn embedfs://job-metric-statistics.schema.jsonNetwork interconnect send packet
- ic_read_bwNoobjectNoIn embedfs://job-metric-statistics.schema.jsonNetwork interconnect read bandwidth
- ic_write_bwNoobjectNoIn embedfs://job-metric-statistics.schema.jsonNetwork interconnect write bandwidth
- filesystemsNoarray of objectNo-Array of filesystems

21.1. Property Job meta data > statistics > mem_used

Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: Memory capacity used (required)

21.2. Property Job meta data > statistics > cpu_load

Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: CPU requested core utilization (load 1m) (required)

21.3. Property Job meta data > statistics > flops_any

Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: Total flop rate with DP flops scaled up (required)

21.4. Property Job meta data > statistics > mem_bw

Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: Main memory bandwidth (required)

21.5. Property Job meta data > statistics > net_bw

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: Total fast interconnect network bandwidth (required)

21.6. Property Job meta data > statistics > file_bw

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: Total file IO bandwidth (required)

21.7. Property Job meta data > statistics > ipc

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: Instructions executed per cycle

21.8. Property Job meta data > statistics > cpu_user

Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: CPU user active core utilization

21.9. Property Job meta data > statistics > flops_dp

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: Double precision flop rate

21.10. Property Job meta data > statistics > flops_sp

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: Single precision flops rate

21.11. Property Job meta data > statistics > rapl_power

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: CPU power consumption

21.12. Property Job meta data > statistics > acc_used

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: GPU utilization

21.13. Property Job meta data > statistics > acc_mem_used

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: GPU memory capacity used

21.14. Property Job meta data > statistics > acc_power

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: GPU power consumption

21.15. Property Job meta data > statistics > clock

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: Average core frequency

21.16. Property Job meta data > statistics > eth_read_bw

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: Ethernet read bandwidth

21.17. Property Job meta data > statistics > eth_write_bw

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: Ethernet write bandwidth

21.18. Property Job meta data > statistics > ic_rcv_packets

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: Network interconnect read packets

21.19. Property Job meta data > statistics > ic_send_packets

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: Network interconnect send packet

21.20. Property Job meta data > statistics > ic_read_bw

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: Network interconnect read bandwidth

21.21. Property Job meta data > statistics > ic_write_bw

Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: Network interconnect write bandwidth

21.22. Property Job meta data > statistics > filesystems

Typearray of object
RequiredNo

Description: Array of filesystems

Array restrictions
Min items1
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
filesystems items-

21.22.1. Job meta data > statistics > filesystems > filesystems items

Typeobject
RequiredNo
Additional propertiesAny type allowed
PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ nameNostringNo--
+ typeNoenum (of string)No--
+ read_bwNoobjectNoIn embedfs://job-metric-statistics.schema.jsonFile system read bandwidth
+ write_bwNoobjectNoIn embedfs://job-metric-statistics.schema.jsonFile system write bandwidth
- read_reqNoobjectNoIn embedfs://job-metric-statistics.schema.jsonFile system read requests
- write_reqNoobjectNoIn embedfs://job-metric-statistics.schema.jsonFile system write requests
- inodesNoobjectNoIn embedfs://job-metric-statistics.schema.jsonFile system write requests
- accessesNoobjectNoIn embedfs://job-metric-statistics.schema.jsonFile system open and close
- fsyncNoobjectNoIn embedfs://job-metric-statistics.schema.jsonFile system fsync
- createNoobjectNoIn embedfs://job-metric-statistics.schema.jsonFile system create
- openNoobjectNoIn embedfs://job-metric-statistics.schema.jsonFile system open
- closeNoobjectNoIn embedfs://job-metric-statistics.schema.jsonFile system close
- seekNoobjectNoIn embedfs://job-metric-statistics.schema.jsonFile system seek
21.22.1.1. Property Job meta data > statistics > filesystems > filesystems items > name
Typestring
RequiredYes
21.22.1.2. Property Job meta data > statistics > filesystems > filesystems items > type
Typeenum (of string)
RequiredYes

Must be one of:

  • “nfs”
  • “lustre”
  • “gpfs”
  • “nvme”
  • “ssd”
  • “hdd”
  • “beegfs”
21.22.1.3. Property Job meta data > statistics > filesystems > filesystems items > read_bw
Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: File system read bandwidth

21.22.1.4. Property Job meta data > statistics > filesystems > filesystems items > write_bw
Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: File system write bandwidth

21.22.1.5. Property Job meta data > statistics > filesystems > filesystems items > read_req
Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: File system read requests

21.22.1.6. Property Job meta data > statistics > filesystems > filesystems items > write_req
Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: File system write requests

21.22.1.7. Property Job meta data > statistics > filesystems > filesystems items > inodes
Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: File system write requests

21.22.1.8. Property Job meta data > statistics > filesystems > filesystems items > accesses
Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: File system open and close

21.22.1.9. Property Job meta data > statistics > filesystems > filesystems items > fsync
Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: File system fsync

21.22.1.10. Property Job meta data > statistics > filesystems > filesystems items > create
Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: File system create

21.22.1.11. Property Job meta data > statistics > filesystems > filesystems items > open
Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: File system open

21.22.1.12. Property Job meta data > statistics > filesystems > filesystems items > close
Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: File system close

21.22.1.13. Property Job meta data > statistics > filesystems > filesystems items > seek
Typeobject
RequiredNo
Additional propertiesAny type allowed
Defined inembedfs://job-metric-statistics.schema.json

Description: File system seek


Generated using json-schema-for-humans on 2024-12-04 at 16:45:59 +0100

1.7.7 - Job Archive Metrics Data Schema

ClusterCockpit Job Archive Metrics Data Schema Reference

The following schema in its raw form can be found in the ClusterCockpit GitHub repository.

Job metric data

Title: Job metric data

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: Metric data of a HPC job

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ unitNoobjectNoIn embedfs://unit.schema.jsonMetric unit
+ timestepNointegerNo-Measurement interval in seconds
- thresholdsNoobjectNo-Metric thresholds for specific system
- statisticsSeriesNoobjectNo-Statistics series across topology
+ seriesNoarray of objectNo--

1. Property Job metric data > unit

Typeobject
RequiredYes
Additional propertiesAny type allowed
Defined inembedfs://unit.schema.json

Description: Metric unit

2. Property Job metric data > timestep

Typeinteger
RequiredYes

Description: Measurement interval in seconds

3. Property Job metric data > thresholds

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: Metric thresholds for specific system

PropertyPatternTypeDeprecatedDefinitionTitle/Description
- peakNonumberNo--
- normalNonumberNo--
- cautionNonumberNo--
- alertNonumberNo--

3.1. Property Job metric data > thresholds > peak

Typenumber
RequiredNo

3.2. Property Job metric data > thresholds > normal

Typenumber
RequiredNo

3.3. Property Job metric data > thresholds > caution

Typenumber
RequiredNo

3.4. Property Job metric data > thresholds > alert

Typenumber
RequiredNo

4. Property Job metric data > statisticsSeries

Typeobject
RequiredNo
Additional propertiesAny type allowed

Description: Statistics series across topology

PropertyPatternTypeDeprecatedDefinitionTitle/Description
- minNoarray of numberNo--
- maxNoarray of numberNo--
- meanNoarray of numberNo--
- percentilesNoobjectNo--

4.1. Property Job metric data > statisticsSeries > min

Typearray of number
RequiredNo
Array restrictions
Min items3
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
min items-

4.1.1. Job metric data > statisticsSeries > min > min items

Typenumber
RequiredNo
Restrictions
Minimum≥ 0

4.2. Property Job metric data > statisticsSeries > max

Typearray of number
RequiredNo
Array restrictions
Min items3
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
max items-

4.2.1. Job metric data > statisticsSeries > max > max items

Typenumber
RequiredNo
Restrictions
Minimum≥ 0

4.3. Property Job metric data > statisticsSeries > mean

Typearray of number
RequiredNo
Array restrictions
Min items3
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
mean items-

4.3.1. Job metric data > statisticsSeries > mean > mean items

Typenumber
RequiredNo
Restrictions
Minimum≥ 0

4.4. Property Job metric data > statisticsSeries > percentiles

Typeobject
RequiredNo
Additional propertiesAny type allowed
PropertyPatternTypeDeprecatedDefinitionTitle/Description
- 10Noarray of numberNo--
- 20Noarray of numberNo--
- 30Noarray of numberNo--
- 40Noarray of numberNo--
- 50Noarray of numberNo--
- 60Noarray of numberNo--
- 70Noarray of numberNo--
- 80Noarray of numberNo--
- 90Noarray of numberNo--
- 25Noarray of numberNo--
- 75Noarray of numberNo--

4.4.1. Property Job metric data > statisticsSeries > percentiles > 10

Typearray of number
RequiredNo
Array restrictions
Min items3
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
10 items-
4.4.1.1. Job metric data > statisticsSeries > percentiles > 10 > 10 items
Typenumber
RequiredNo
Restrictions
Minimum≥ 0

4.4.2. Property Job metric data > statisticsSeries > percentiles > 20

Typearray of number
RequiredNo
Array restrictions
Min items3
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
20 items-
4.4.2.1. Job metric data > statisticsSeries > percentiles > 20 > 20 items
Typenumber
RequiredNo
Restrictions
Minimum≥ 0

4.4.3. Property Job metric data > statisticsSeries > percentiles > 30

Typearray of number
RequiredNo
Array restrictions
Min items3
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
30 items-
4.4.3.1. Job metric data > statisticsSeries > percentiles > 30 > 30 items
Typenumber
RequiredNo
Restrictions
Minimum≥ 0

4.4.4. Property Job metric data > statisticsSeries > percentiles > 40

Typearray of number
RequiredNo
Array restrictions
Min items3
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
40 items-
4.4.4.1. Job metric data > statisticsSeries > percentiles > 40 > 40 items
Typenumber
RequiredNo
Restrictions
Minimum≥ 0

4.4.5. Property Job metric data > statisticsSeries > percentiles > 50

Typearray of number
RequiredNo
Array restrictions
Min items3
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
50 items-
4.4.5.1. Job metric data > statisticsSeries > percentiles > 50 > 50 items
Typenumber
RequiredNo
Restrictions
Minimum≥ 0

4.4.6. Property Job metric data > statisticsSeries > percentiles > 60

Typearray of number
RequiredNo
Array restrictions
Min items3
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
60 items-
4.4.6.1. Job metric data > statisticsSeries > percentiles > 60 > 60 items
Typenumber
RequiredNo
Restrictions
Minimum≥ 0

4.4.7. Property Job metric data > statisticsSeries > percentiles > 70

Typearray of number
RequiredNo
Array restrictions
Min items3
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
70 items-
4.4.7.1. Job metric data > statisticsSeries > percentiles > 70 > 70 items
Typenumber
RequiredNo
Restrictions
Minimum≥ 0

4.4.8. Property Job metric data > statisticsSeries > percentiles > 80

Typearray of number
RequiredNo
Array restrictions
Min items3
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
80 items-
4.4.8.1. Job metric data > statisticsSeries > percentiles > 80 > 80 items
Typenumber
RequiredNo
Restrictions
Minimum≥ 0

4.4.9. Property Job metric data > statisticsSeries > percentiles > 90

Typearray of number
RequiredNo
Array restrictions
Min items3
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
90 items-
4.4.9.1. Job metric data > statisticsSeries > percentiles > 90 > 90 items
Typenumber
RequiredNo
Restrictions
Minimum≥ 0

4.4.10. Property Job metric data > statisticsSeries > percentiles > 25

Typearray of number
RequiredNo
Array restrictions
Min items3
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
25 items-
4.4.10.1. Job metric data > statisticsSeries > percentiles > 25 > 25 items
Typenumber
RequiredNo
Restrictions
Minimum≥ 0

4.4.11. Property Job metric data > statisticsSeries > percentiles > 75

Typearray of number
RequiredNo
Array restrictions
Min items3
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
75 items-
4.4.11.1. Job metric data > statisticsSeries > percentiles > 75 > 75 items
Typenumber
RequiredNo
Restrictions
Minimum≥ 0

5. Property Job metric data > series

Typearray of object
RequiredYes
Array restrictions
Min itemsN/A
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
Each item of this array must beDescription
series items-

5.1. Job metric data > series > series items

Typeobject
RequiredNo
Additional propertiesAny type allowed
PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ hostnameNostringNo--
- idNostringNo--
+ statisticsNoobjectNo-Statistics across time dimension
+ dataNoarrayNo--

5.1.1. Property Job metric data > series > series items > hostname

Typestring
RequiredYes

5.1.2. Property Job metric data > series > series items > id

Typestring
RequiredNo

5.1.3. Property Job metric data > series > series items > statistics

Typeobject
RequiredYes
Additional propertiesAny type allowed

Description: Statistics across time dimension

PropertyPatternTypeDeprecatedDefinitionTitle/Description
+ avgNonumberNo-Series average
+ minNonumberNo-Series minimum
+ maxNonumberNo-Series maximum
5.1.3.1. Property Job metric data > series > series items > statistics > avg
Typenumber
RequiredYes

Description: Series average

Restrictions
Minimum≥ 0
5.1.3.2. Property Job metric data > series > series items > statistics > min
Typenumber
RequiredYes

Description: Series minimum

Restrictions
Minimum≥ 0
5.1.3.3. Property Job metric data > series > series items > statistics > max
Typenumber
RequiredYes

Description: Series maximum

Restrictions
Minimum≥ 0

5.1.4. Property Job metric data > series > series items > data

Typearray
RequiredYes
Array restrictions
Min items1
Max itemsN/A
Items unicityFalse
Additional itemsFalse
Tuple validationSee below
5.1.4.1. At least one of the items must be
Typenumber
RequiredNo
Restrictions
Minimum≥ 0

Generated using json-schema-for-humans on 2024-12-04 at 16:45:59 +0100

2 - Metric Store

ClusterCockpit Metric Store References

Reference information regarding the ClusterCockpit component “cc-metric-store” (GitHub Repo).

2.1 - Command Line

ClusterCockpit Metric Store Command Line Options

This page describes the command line options for the cc-metric-store executable.


  -config <path>

Function: Specifies alternative path to application configuration file.

Default: ./config.json

Example: -config ./configfiles/configuration.json


  -dev

Function: Enables the Swagger UI REST API documentation and playground


  -gops

Function: Go server listens via github.com/google/gops/agent (for debugging).


  -version

Function: Shows version information and exits.

Example config:

{
  "metrics": {
    "debug_metric": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "clock": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "cpu_idle": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "cpu_iowait": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "cpu_irq": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "cpu_system": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "cpu_user": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "nv_mem_util": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "nv_temp": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "nv_sm_clock": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "acc_utilization": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "acc_mem_used": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "acc_power": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "flops_any": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "flops_dp": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "flops_sp": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "ib_recv": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "ib_xmit": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "ib_recv_pkts": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "ib_xmit_pkts": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "cpu_power": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "core_power": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "mem_power": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "ipc": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "cpu_load": {
      "frequency": 60,
      "aggregation": null
    },
    "lustre_close": {
      "frequency": 60,
      "aggregation": null
    },
    "lustre_open": {
      "frequency": 60,
      "aggregation": null
    },
    "lustre_statfs": {
      "frequency": 60,
      "aggregation": null
    },
    "lustre_read_bytes": {
      "frequency": 60,
      "aggregation": null
    },
    "lustre_write_bytes": {
      "frequency": 60,
      "aggregation": null
    },
    "net_bw": {
      "frequency": 60,
      "aggregation": null
    },
    "file_bw": {
      "frequency": 60,
      "aggregation": null
    },
    "mem_bw": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "mem_cached": {
      "frequency": 60,
      "aggregation": null
    },
    "mem_used": {
      "frequency": 60,
      "aggregation": null
    },
    "vectorization_ratio": {
      "frequency": 60,
      "aggregation": "avg"
    }
  },
  "checkpoints": {
    "interval": "1h",
    "directory": "./var/checkpoints",
    "restore": "1h"
  },
  "archive": {
    "interval": "24h",
    "directory": "./var/archive"
  },
  "http-api": {
    "address": "localhost:8082",
    "https-cert-file": null,
    "https-key-file": null
  },
  "retention-in-memory": "48h",
  "nats": null,
  "jwt-public-key": "kzfYrYy+TzpanWZHJ5qSdMj5uKUWgq74BWhQG6copP0="
}

2.2 - Configuration

ClusterCockpit Metric Store Configuration Option References

All durations are specified as string that will be parsed like this (Allowed suffixes: s, m, h, …).

  • metrics: Map of metric-name to objects with the following properties
    • frequency: Timestep/Interval/Resolution of this metric
    • aggregation: Can be "sum", "avg" or null
      • null means aggregation across nodes is forbidden for this metric
      • "sum" means that values from the child levels are summed up for the parent level
      • "avg" means that values from the child levels are averaged for the parent level
    • scope: Unused at the moment, should be something like "node", "socket" or "hwthread"
  • nats:
    • address: Url of NATS.io server, example: “nats://localhost:4222”
    • username and password: Optional, if provided use those for the connection
    • subscriptions:
      • subscribe-to: Where to expect the measurements to be published
      • cluster-tag: Default value for the cluster tag
  • http-api:
    • address: Address to bind to, for example 0.0.0.0:8080
    • https-cert-file and https-key-file: Optional, if provided enable HTTPS using those files as certificate/key
  • jwt-public-key: Base64 encoded string, use this to verify requests to the HTTP API
  • retention-on-memory: Keep all values in memory for at least that amount of time
  • checkpoints:
    • interval: Do checkpoints every X seconds/minutes/hours
    • directory: Path to a directory
    • restore: After a restart, load the last X seconds/minutes/hours of data back into memory
  • archive:
    • interval: Move and compress all checkpoints not needed anymore every X seconds/minutes/hours
    • directory: Path to a directory

2.3 - Metric Store REST API

ClusterCockpit Metric Store RESTful API Endpoint description

Authentication

JWT tokens

cc-metric-store supports only JWT tokens using the EdDSA/Ed25519 signing method. The token is provided using the Authorization Bearer header.

Example script to test the endpoint:

#Only use JWT token if the JWT authentication has been setup
JWT="eyJ0eXAiOiJKV1QiLCJhbGciOiJFZERTQSJ9.eyJ1c2VyIjoiYWRtaW4iLCJyb2xlcyI6WyJST0xFX0FETUlOIiwiUk9MRV9BTkFMWVNUIiwiUk9MRV9VU0VSIl19.d-3_3FZTsadPjDEdsWrrQ7nS0edMAR4zjl-eK7rJU3HziNBfI9PDHDIpJVHTNN5E5SlLGLFXctWyKAkwhXL-Dw"

curl -X 'GET' 'http://localhost:8081/api/query/' -H "Authorization: Bearer $JWT" -d "{ \"cluster\": \"alex\", \"from\": 1720879275, \"to\": 1720964715, \"queries\": [{\"metric\": \"cpu_load\",\"host\": \"a0124\"}] }"

NATS

TODO

Usage of Swagger UI

This Swagger UI is also available as part of cc-metric-store if you start it with the dev option:

./cc-metric-store -dev

You may access it at this URL.

Payload format for write endpoint

The data comes in Influx DB line protocol format.

<metric>,cluster=<cluster>,hostname=<hostname>,type=<node/hwthread/etc> value=<value> <epoch_time_in_ns_or_s>

Real example:

proc_run,cluster=fritz,hostname=f2163,type=node value=4i 1725620476214474893

A more detailed description of the ClusterCockpit flavored Influx DB line protocol and their types can be found here in CC specification.

Example script to test endpoint:

#Only use JWT token if the JWT authentication has been setup
JWT="eyJ0eXAiOiJKV1QiLCJhbGciOiJFZERTQSJ9.eyJ1c2VyIjoiYWRtaW4iLCJyb2xlcyI6WyJST0xFX0FETUlOIiwiUk9MRV9BTkFMWVNUIiwiUk9MRV9VU0VSIl19.d-3_3FZTsadPjDEdsWrrQ7nS0edMAR4zjl-eK7rJU3HziNBfI9PDHDIpJVHTNN5E5SlLGLFXctWyKAkwhXL-Dw"

curl -X 'GET' 'http://localhost:8081/api/write/?cluster=alex' -H "Authorization: Bearer $JWT" -d "proc_run,cluster=fritz,hostname=f2163,type=node value=4i 1725620476214474893"

Usage of Swagger UI

This Swagger UI is also available as part of cc-metric-store if you start it with the dev option:

./cc-metric-store -dev

You may access it at this URL.

Swagger API Reference

3 - cc-event-store

Documentation of cc-event-store

cc-event-store

A simple short-term store for job and system events as well as logs in the ClusterCockpit ecosystem. Event and Logs were introduced as an extension to the previous CCMetric messages, numeric data from the compute nodes known as metrics (see lineprotocol specifcation at cc-specification). Events and Logs are strings and in contrast to the periodic sending of metric from the cc-metric-collector, events and logs can happen at any time. All storage backends have a configuration option for the retention time for which events should be kept. Logs are never deleted.

Configuration

{
    "receiver" : "/path/to/receiver/config/file",
    "storage" : "/path/to/storage/config/file",
    "api" : "/path/to/api/config/file"
}

For the format of each file, see here:

Structure

The cc-event-store has 4 components that are coupled together in the binary.

  • The event and log message receivers are reused from cc-metric-collector. There they are used to receive metrics from remote targets but are flexible enough to receive events and logs as well. See cc-metric-collector’s receivers.
  • The router forwards the events and logs to the storage manager.
  • The storage manager is a frontend to some database backends like SQLite or Postgres. The SQLite backend is the main development target.
  • The REST API is mainly used to query the storage backends but can also be used to insert events and logs.

This also explains why cc-event-store uses multiple configuration files, all coupled by a central configuration file. Each component has its own configuration file which makes it possible to reuse the receivers from cc-metric-collector without any changes, it just requires its configuration file.

3.1 - cc-event-store's REST API

Documentation of cc-event-store’s REST API

Configuration

{
    "address" : "localhost",
    "port": "8088",
    "idle_timeout": "120s",
    "keep_alives_enabled": true,
    "jwt_public_key": "0123456789ABCDEF",
    "enable_swagger_ui": true
}
  • address: Hostname or IP to listen for requests
  • port: Port number (as string) to listen at
  • idle_timeout: Close connection after this time. Must be a parseable time for time.ParseDuration
  • keep_alives_enabled: Keep connections alive for some time
  • jwt_public_key: JWT public key used for authentication
  • enable_swagger_ui: Enable the Swagger UI, a web-based documentation of the REST API

Endpoints

  • http://address:port/api/query
  • http://address:port/api/write?cluster=<cluster>

See generated Swagger documentation or web-based Swagger UI for more information and the data format accepted by the endpoints

3.2 - cc-event-store's storage backends

Documentation of cc-event-store’s storage backends

Storage component

This component contains different backends for storing CCEvent and CCLog messages. The this in only a short term storage, so all backends have a notion of retention time to delete older entries.

Backends

Each backend uses it’s own configuration file entries. Check the backend-specific page for more information.

3.2.1 - Storage backend for Postgres

Toplevel postgresStorage

Storage backend for Postgres

Configuration

{
    "type" : "postgres",
    "server": "127.0.0.1",
    "port": 5432,
    "database_path" : "database_name",
    "flags" : [
        "open_flag=X"
    ],
    "username" : "myuser",
    "password" : "mypass",
    "connection_timeout" : 1

}
  • type: Has to be postgres
  • server: IP or name of server (default localhost)
  • port: Port number of server (default 5432)
  • database_path: The backed connects to this database
  • flags: Flags when opening Postgres. For things like connect settings (sslmode=verify-full)
  • username: If given, the database is opened with the given username
  • password: If given and username is also given, use it to open the database
  • connection_timeout: Timeout for connection in seconds (default 1)

Storage

The Postgres backend stores CCEvents and CCLog messages in distict tables named <cluster>_events and <cluster>_logs respecively. It does not make use of distinct tables to hold specific and returning parts of CCEvents and CCLog messages (namely hostname tag, type tag and typeid tag). The timestamps of the messages are stored as UNIX timestamps with precision in seconds.

3.2.2 - Storage backend for SQLite3

Toplevel sqliteStorage

Storage backend for SQLite3

Configuration

{
    "type" : "sqlite",
    "database_path" : "/path/for/databases",
    "flags" : [
        "open_flag=X"
    ],
    "username" : "myuser",
    "password" : "mypass"
}
  • type: Has to be sqlite
  • database_path: The backed creates tables based on the cluster names in this path
  • flags: Flags when opening SQLite. For things like timeouts (_timeout=5000), storage settings (_journal=WAL), …
  • username: If given, the database is opened with the given username
  • password: If given and username is also given, use it to open the database

Storage

The Sqlite backend stores CCEvents and CCLog messages in distict tables named <cluster>_events and <cluster>_logs respecively. It does not make use of distinct tables to hold specific and returning parts of CCEvents and CCLog messages (namely hostname tag, type tag and typeid tag). The timestamps of the messages are stored as UNIX timestamps with precision in seconds.

4 - cc-metric-collector

Documentation of cc-metric-collector

cc-metric-collector

A node agent for measuring, processing and forwarding node level metrics. It is part of the ClusterCockpit ecosystem.

The metric collector sends (and receives) metric in the InfluxDB line protocol as it provides flexibility while providing a separation between tags (like index columns in relational databases) and fields (like data columns).

There is a single timer loop that triggers all collectors serially, collects the collectors’ data and sends the metrics to the sink. This is done as all data is submitted with a single time stamp. The sinks currently use mostly blocking APIs.

The receiver runs as a go routine side-by-side with the timer loop and asynchronously forwards received metrics to the sink.

DOI

Configuration

Configuration is implemented using a single json document that is distributed over network and may be persisted as file. Supported metrics are documented here.

There is a main configuration file with basic settings that point to the other configuration files for the different components.

{
  "sinks": "sinks.json",
  "collectors" : "collectors.json",
  "receivers" : "receivers.json",
  "router" : "router.json",
  "interval": "10s",
  "duration": "1s"
}

The interval defines how often the metrics should be read and send to the sink. The duration tells collectors how long one measurement has to take. This is important for some collectors, like the likwid collector. For more information, see here.

See the component READMEs for their configuration:

Installation

$ git clone git@github.com:ClusterCockpit/cc-metric-collector.git
$ make (downloads LIKWID, builds it as static library with 'direct' accessmode and copies all required files for the collector)
$ go get (requires at least golang 1.16)
$ make

For more information, see here.

Running

$ ./cc-metric-collector --help
Usage of metric-collector:
  -config string
    	Path to configuration file (default "./config.json")
  -log string
    	Path for logfile (default "stderr")
  -once
    	Run all collectors only once

Scenarios

The metric collector was designed with flexibility in mind, so it can be used in many scenarios. Here are a few:

flowchart TD
  subgraph a ["Cluster A"]
  nodeA[NodeA with CC collector]
  nodeB[NodeB with CC collector]
  nodeC[NodeC with CC collector]
  end
  a --> db[(Database)]
  db <--> ccweb("Webfrontend")
flowchart TD
  subgraph a [ClusterA]
  direction LR
  nodeA[NodeA with CC collector]
  nodeB[NodeB with CC collector]
  nodeC[NodeC with CC collector]
  end
  subgraph b [ClusterB]
  direction LR
  nodeD[NodeD with CC collector]
  nodeE[NodeE with CC collector]
  nodeF[NodeF with CC collector]
  end
  a --> ccrecv{"CC collector as receiver"}
  b --> ccrecv
  ccrecv --> db[("Database1")]
  ccrecv -.-> db2[("Database2")]
  db <-.-> ccweb("Webfrontend")

Contributing

The ClusterCockpit ecosystem is designed to be used by different HPC computing centers. Since configurations and setups differ between the centers, the centers likely have to put some work into the cc-metric-collector to gather all desired metrics.

You are free to open an issue to request a collector but we would also be happy about PRs.

Contact

4.1 - cc-metric-collector's collectors

Documentation of cc-metric-collector’s collectors

CCMetric collectors

This folder contains the collectors for the cc-metric-collector.

Configuration

{
    "collector_type" : {
        <collector specific configuration>
    }
}

In contrast to the configuration files for sinks and receivers, the collectors configuration is not a list but a set of dicts. This is required because we didn’t manage to partially read the type before loading the remaining configuration. We are eager to change this to the same format.

Available collectors

Todos

  • Aggreate metrics to higher topology entity (sum hwthread metrics to socket metric, …). Needs to be configurable

Contributing own collectors

A collector reads data from any source, parses it to metrics and submits these metrics to the metric-collector. A collector provides three function:

  • Name() string: Return the name of the collector
  • Init(config json.RawMessage) error: Initializes the collector using the given collector-specific config in JSON. Check if needed files/commands exists, …
  • Initialized() bool: Check if a collector is successfully initialized
  • Read(duration time.Duration, output chan ccMetric.CCMetric): Read, parse and submit data to the output channel as CCMetric. If the collector has to measure anything for some duration, use the provided function argument duration.
  • Close(): Closes down the collector.

It is recommanded to call setup() in the Init() function.

Finally, the collector needs to be registered in the collectorManager.go. There is a list of collectors called AvailableCollectors which is a map (collector_type_string -> pointer to MetricCollector interface). Add a new entry with a descriptive name and the new collector.

Sample collector

package collectors

import (
    "encoding/json"
    "time"

    lp "github.com/ClusterCockpit/cc-metric-collector/internal/ccMetric"
)

// Struct for the collector-specific JSON config
type SampleCollectorConfig struct {
    ExcludeMetrics []string `json:"exclude_metrics"`
}

type SampleCollector struct {
    metricCollector
    config SampleCollectorConfig
}

func (m *SampleCollector) Init(config json.RawMessage) error {
    // Check if already initialized
    if m.init {
        return nil
    }

    m.name = "SampleCollector"
    m.setup()
    if len(config) > 0 {
        err := json.Unmarshal(config, &m.config)
        if err != nil {
            return err
        }
    }
    m.meta = map[string]string{"source": m.name, "group": "Sample"}

    m.init = true
    return nil
}

func (m *SampleCollector) Read(interval time.Duration, output chan lp.CCMetric) {
    if !m.init {
        return
    }
    // tags for the metric, if type != node use proper type and type-id
    tags := map[string]string{"type" : "node"}

    x, err := GetMetric()
    if err != nil {
        cclog.ComponentError(m.name, fmt.Sprintf("Read(): %v", err))
    }

    // Each metric has exactly one field: value !
    value := map[string]interface{}{"value": int64(x)}
    if y, err := lp.New("sample_metric", tags, m.meta, value, time.Now()); err == nil {
        output <- y
    }
}

func (m *SampleCollector) Close() {
    m.init = false
    return
}

4.1.1 - BeeGFS on Demand collector

Toplevel beegfsmetaMetric

BeeGFS on Demand collector

This Collector is to collect BeeGFS on Demand (BeeOND) metadata clientstats.

  "beegfs_meta": {
	"beegfs_path": "/usr/bin/beegfs-ctl",
    "exclude_filesystem": [
      "/mnt/ignore_me"
    ],
    "exclude_metrics": [     
          "ack",
          "entInf",
          "fndOwn"
    ]
  }

The BeeGFS On Demand (BeeOND) collector uses the beegfs-ctl command to read performance metrics for BeeGFS filesystems.

The reported filesystems can be filtered with the exclude_filesystem option in the configuration.

The path to the beegfs-ctl command can be configured with the beegfs_path option in the configuration.

When using the exclude_metrics option, the excluded metrics are summed as other.

Important: The metrics listed below, are similar to the naming of BeeGFS. The Collector prefixes these with beegfs_cstorage(beegfs client storage).

For example beegfs metric open-> beegfs_cstorage_open

Available Metrics:

  • sum
  • ack
  • close
  • entInf
  • fndOwn
  • mkdir
  • create
  • rddir
  • refrEnt
  • mdsInf
  • rmdir
  • rmLnk
  • mvDirIns
  • mvFiIns
  • open
  • ren
  • sChDrct
  • sAttr
  • sDirPat
  • stat
  • statfs
  • trunc
  • symlnk
  • unlnk
  • lookLI
  • statLI
  • revalLI
  • openLI
  • createLI
  • hardlnk
  • flckAp
  • flckEn
  • flckRg
  • dirparent
  • listXA
  • getXA
  • rmXA
  • setXA
  • mirror

The collector adds a filesystem tag to all metrics

4.1.2 - BeeGFS on Demand collector

Toplevel beegfsstorageMetric

BeeGFS on Demand collector

This Collector is to collect BeeGFS on Demand (BeeOND) storage stats.

  "beegfs_storage": {
	"beegfs_path": "/usr/bin/beegfs-ctl",
    "exclude_filesystem": [
      "/mnt/ignore_me"
    ],
    "exclude_metrics": [     
          "ack",
		  "storInf",
		  "unlnk"
    ]
  }

The BeeGFS On Demand (BeeOND) collector uses the beegfs-ctl command to read performance metrics for BeeGFS filesystems.

The reported filesystems can be filtered with the exclude_filesystem option in the configuration.

The path to the beegfs-ctl command can be configured with the beegfs_path option in the configuration.

When using the exclude_metrics option, the excluded metrics are summed as other.

Important: The metrics listed below, are similar to the naming of BeeGFS. The Collector prefixes these with beegfs_cstorage_(beegfs client meta). For example beegfs metric open-> beegfs_cstorage_

Note: BeeGFS FS offers many Metadata Information. Probably it makes sense to exlcude most of them. Nevertheless, these excluded metrics will be summed as beegfs_cstorage_other.

Available Metrics:

  • “sum”
  • “ack”
  • “sChDrct”
  • “getFSize”
  • “sAttr”
  • “statfs”
  • “trunc”
  • “close”
  • “fsync”
  • “ops-rd”
  • “MiB-rd/s”
  • “ops-wr”
  • “MiB-wr/s”
  • “endbg”
  • “hrtbeat”
  • “remNode”
  • “storInf”
  • “unlnk”

The collector adds a filesystem tag to all metrics

4.1.3 - cpufreq_cpuinfo collector

Toplevel cpufreqCpuinfoMetric

cpufreq_cpuinfo collector

  "cpufreq_cpuinfo": {}

The cpufreq_cpuinfo collector reads the clock frequency from /proc/cpuinfo and outputs a handful hwthread metrics.

Metrics:

  • cpufreq

4.1.4 - cpufreq_cpuinfo collector

Toplevel cpufreqMetric

cpufreq_cpuinfo collector

  "cpufreq": {
    "exclude_metrics": []
  }

The cpufreq collector reads the clock frequency from /sys/devices/system/cpu/cpu*/cpufreq and outputs a handful hwthread metrics.

Metrics:

  • cpufreq

4.1.5 - cpustat collector

Toplevel cpustatMetric

cpustat collector

  "cpustat": {
    "exclude_metrics": [
      "cpu_idle"
    ]
  }

The cpustat collector reads data from /proc/stat and outputs a handful node and hwthread metrics. If a metric is not required, it can be excluded from forwarding it to the sink.

Metrics:

  • cpu_user with unit=Percent
  • cpu_nice with unit=Percent
  • cpu_system with unit=Percent
  • cpu_idle with unit=Percent
  • cpu_iowait with unit=Percent
  • cpu_irq with unit=Percent
  • cpu_softirq with unit=Percent
  • cpu_steal with unit=Percent
  • cpu_guest with unit=Percent
  • cpu_guest_nice with unit=Percent
  • cpu_used = cpu_* - cpu_idle with unit=Percent
  • num_cpus

4.1.6 - customcmd collector

Toplevel customCmdMetric

customcmd collector

  "customcmd": {
    "exclude_metrics": [
      "mymetric"
    ],
    "files" : [
      "/var/run/myapp.metrics"
    ],
    "commands" : [
      "/usr/local/bin/getmetrics.pl"
    ]
  }

The customcmd collector reads data from files and the output of executed commands. The files and commands can output multiple metrics (separated by newline) but the have to be in the InfluxDB line protocol. If a metric is not parsable, it is skipped. If a metric is not required, it can be excluded from forwarding it to the sink.

4.1.7 - diskstat collector

Toplevel diskstatMetric

diskstat collector

  "diskstat": {
    "exclude_metrics": [
      "disk_total"
    ],
  }

The diskstat collector reads data from /proc/self/mounts and outputs a handful node metrics. If a metric is not required, it can be excluded from forwarding it to the sink.

Metrics per device (with device tag):

  • disk_total (unit GBytes)
  • disk_free (unit GBytes)

Global metrics:

  • part_max_used (unit percent)

4.1.8 - gpfs collector

Toplevel gpfsMetric

gpfs collector

  "ibstat": {
    "mmpmon_path": "/path/to/mmpmon",
    "exclude_filesystem": [
      "fs1"
    ],
    "send_bandwidths": true,
    "send_total_values": true
  }

The gpfs collector uses the mmpmon command to read performance metrics for GPFS / IBM Spectrum Scale filesystems.

The reported filesystems can be filtered with the exclude_filesystem option in the configuration.

The path to the mmpmon command can be configured with the mmpmon_path option in the configuration. If nothing is set, the collector searches in $PATH for mmpmon.

Metrics:

  • gpfs_bytes_read
  • gpfs_bytes_written
  • gpfs_num_opens
  • gpfs_num_closes
  • gpfs_num_reads
  • gpfs_num_writes
  • gpfs_num_readdirs
  • gpfs_num_inode_updates
  • gpfs_bytes_total = gpfs_bytes_read + gpfs_bytes_written (if send_total_values == true)
  • gpfs_iops = gpfs_num_reads + gpfs_num_writes (if send_total_values == true)
  • gpfs_metaops = gpfs_num_inode_updates + gpfs_num_closes + gpfs_num_opens + gpfs_num_readdirs (if send_total_values == true)
  • gpfs_bw_read (if send_bandwidths == true)
  • gpfs_bw_write (if send_bandwidths == true)

The collector adds a filesystem tag to all metrics

4.1.9 - ibstat collector

Toplevel infinibandMetric

ibstat collector

  "ibstat": {
    "exclude_devices": [
      "mlx4"
    ],
    "send_abs_values": true,
    "send_derived_values": true
  }

The ibstat collector includes all Infiniband devices that can be found below /sys/class/infiniband/ and where any of the ports provides a LID file (/sys/class/infiniband/<dev>/ports/<port>/lid)

The devices can be filtered with the exclude_devices option in the configuration.

For each found LID the collector reads data through the sysfs files below /sys/class/infiniband/<device>. (See: https://www.kernel.org/doc/Documentation/ABI/stable/sysfs-class-infiniband)

Metrics:

  • ib_recv
  • ib_xmit
  • ib_recv_pkts
  • ib_xmit_pkts
  • ib_total = ib_recv + ib_xmit (if send_total_values == true)
  • ib_total_pkts = ib_recv_pkts + ib_xmit_pkts (if send_total_values == true)
  • ib_recv_bw (if send_derived_values == true)
  • ib_xmit_bw (if send_derived_values == true)
  • ib_recv_pkts_bw (if send_derived_values == true)
  • ib_xmit_pkts_bw (if send_derived_values == true)

The collector adds a device tag to all metrics

4.1.10 - iostat collector

Toplevel iostatMetric

iostat collector

  "iostat": {
    "exclude_metrics": [
      "read_ms"
    ],
  }

The iostat collector reads data from /proc/diskstats and outputs a handful node metrics. If a metric is not required, it can be excluded from forwarding it to the sink.

Metrics:

  • io_reads
  • io_reads_merged
  • io_read_sectors
  • io_read_ms
  • io_writes
  • io_writes_merged
  • io_writes_sectors
  • io_writes_ms
  • io_ioops
  • io_ioops_ms
  • io_ioops_weighted_ms
  • io_discards
  • io_discards_merged
  • io_discards_sectors
  • io_discards_ms
  • io_flushes
  • io_flushes_ms

The device name is added as tag device. For more details, see https://www.kernel.org/doc/html/latest/admin-guide/iostats.html

4.1.11 - ipmistat collector

Toplevel ipmiMetric

ipmistat collector

  "ipmistat": {
    "ipmitool_path": "/path/to/ipmitool",
    "ipmisensors_path": "/path/to/ipmi-sensors",
  }

The ipmistat collector reads data from ipmitool (ipmitool sensor) or ipmi-sensors (ipmi-sensors --sdr-cache-recreate --comma-separated-output).

The metrics depend on the output of the underlying tools but contain temperature, power and energy metrics.

4.1.12 - likwid collector

Toplevel likwidMetric

likwid collector

The likwid collector is probably the most complicated collector. The LIKWID library is included as static library with direct access mode. The direct access mode is suitable if the daemon is executed by a root user. The static library does not contain the performance groups, so all information needs to be provided in the configuration.

  "likwid": {
    "force_overwrite" : false,
    "invalid_to_zero" : false,
    "liblikwid_path" : "/path/to/liblikwid.so",
    "accessdaemon_path" : "/folder/that/contains/likwid-accessD",
    "access_mode" : "direct or accessdaemon or perf_event",
    "lockfile_path" : "/var/run/likwid.lock",
    "eventsets": [
      {
        "events" : {
          "COUNTER0": "EVENT0",
          "COUNTER1": "EVENT1"
        },
        "metrics" : [
          {
            "name": "sum_01",
            "calc": "COUNTER0 + COUNTER1",
            "publish": false,
            "unit": "myunit",
            "type": "hwthread"
          }
        ]
      }
    ],
    "globalmetrics" : [
      {
        "name": "global_sum",
        "calc": "sum_01",
        "publish": true,
        "unit": "myunit",
        "type": "hwthread"
      }
    ]
  }

The likwid configuration consists of two parts, the eventsets and globalmetrics:

  • An event set list itself has two parts, the events and a set of derivable metrics. Each of the events is a counter:event pair in LIKWID’s syntax. The metrics are a list of formulas to derive the metric value from the measurements of the events’ values. Each metric has a name, the formula, a type and a publish flag. There is an optional unit field. Counter names can be used like variables in the formulas, so PMC0+PMC1 sums the measurements for the both events configured in the counters PMC0 and PMC1. You can optionally use time for the measurement time and inverseClock for 1.0/baseCpuFrequency. The type tells the LikwidCollector whether it is a metric for each hardware thread (cpu) or each CPU socket (socket). You may specify a unit for the metric with unit. The last one is the publishing flag. It tells the LikwidCollector whether a metric should be sent to the router or is only used internally to compute a global metric.
  • The globalmetrics are metrics which require data from multiple event set measurements to be derived. The inputs are the metrics in the event sets. Similar to the metrics in the event sets, the global metrics are defined by a name, a formula, a type and a publish flag. See event set metrics for details. The only difference is that there is no access to the raw event measurements anymore but only to the metrics. Also time and inverseClock cannot be used anymore. So, the idea is to derive a metric in the eventsets section and reuse it in the globalmetrics part. If you need a metric only for deriving the global metrics, disable forwarding of the event set metrics ("publish": false). Be aware that the combination might be misleading because the “behavior” of a metric changes over time and the multiple measurements might count different computing phases. Similar to the metrics in the eventset, you can specify a metric unit with the unit field.

Additional options:

  • force_overwrite: Same as setting LIKWID_FORCE=1. In case counters are already in-use, LIKWID overwrites their configuration to do its measurements
  • invalid_to_zero: In some cases, the calculations result in NaN or Inf. With this option, all NaN and Inf values are replaces with 0.0. See below in seperate section
  • access_mode: Specify LIKWID access mode: direct for direct register access as root user or accessdaemon. The access mode perf_event is current untested.
  • accessdaemon_path: Folder of the accessDaemon likwid-accessD (like /usr/local/sbin)
  • liblikwid_path: Location of liblikwid.so including file name like /usr/local/lib/liblikwid.so
  • lockfile_path: Location of LIKWID’s lock file if multiple tools should access the hardware counters. Default /var/run/likwid.lock

Available metric types

Hardware performance counters are scattered all over the system nowadays. A counter coveres a specific part of the system. While there are hardware thread specific counter for CPU cycles, instructions and so on, some others are specific for a whole CPU socket/package. To address that, the LikwidCollector provides the specification of a type for each metric.

  • hwthread : One metric per CPU hardware thread with the tags "type" : "hwthread" and "type-id" : "$hwthread_id"
  • socket : One metric per CPU socket/package with the tags "type" : "socket" and "type-id" : "$socket_id"

Note: You cannot specify socket type for a metric that is measured at hwthread type, so some kind of expert knowledge or lookup work in the Likwid Wiki is required. Get the type of each counter from the Architecture pages and as soon as one counter in a metric is socket-specific, the whole metric is socket-specific.

As a guideline:

  • All counters FIXCx, PMCy and TMAz have the type hwthread
  • All counters names containing BOX have the type socket
  • All PWRx counters have type socket, except "PWR1" : "RAPL_CORE_ENERGY" has hwthread type
  • All DFCx counters have type socket

Help with the configuration

The configuration for the likwid collector is quite complicated. Most users don’t use LIKWID with the event:counter notation but rely on the performance groups defined by the LIKWID team for each architecture. In order to help with the likwid collector configuration, we included a script scripts/likwid_perfgroup_to_cc_config.py that creates the configuration of an eventset from a performance group (using a LIKWID installation in $PATH):

$ likwid-perfctr -i
[...]
short name: ICX
[...]
$ likwid-perfctr -a
[...]
MEM_DP
MEM
FLOPS_SP
CLOCK
[...]
$ scripts/likwid_perfgroup_to_cc_config.py ICX MEM_DP
{
  "events": {
    "FIXC0": "INSTR_RETIRED_ANY",
    "FIXC1": "CPU_CLK_UNHALTED_CORE",
    "..." : "..."
  },
  "metrics" : [
    {
      "calc": "time",
      "name": "Runtime (RDTSC) [s]",
      "publish": true,
      "unit": "seconds"
      "type": "hwthread"
    },
    {
      "..." : "..."
    }
  ]
}

You can copy this JSON and add it to the eventsets list. If you specify multiple event sets, you can add globally derived metrics in the extra global_metrics section with the metric names as variables.

Mixed usage between daemon and users

LIKWID checks the file /var/run/likwid.lock before performing any interfering operations. Who is allowed to access the counters is determined by the owner of the file. If it does not exist, it is created for the current user. So, if you want to temporarly allow counter access to a user (e.g. in a job):

Before (SLURM prolog, …)

chown $JOBUSER /var/run/likwid.lock

After (SLURM epilog, …)

chown $CCUSER /var/run/likwid.lock

invalid_to_zero option

In some cases LIKWID returns 0.0 for some events that are further used in processing and maybe used as divisor in a calculation. After evaluation of a metric, the result might be NaN or +-Inf. These resulting metrics are commonly not created and forwarded to the router because the InfluxDB line protocol does not support these special floating-point values. If you want to have them sent, this option forces these metric values to be 0.0 instead.

One might think this does not happen often but often used metrics in the world of performance engineering like Instructions-per-Cycle (IPC) or more frequently the actual CPU clock are derived with events like CPU_CLK_UNHALTED_CORE (Intel) which do not increment in halted state (as the name implies). In there are different power management systems in a chip which can cause a hardware thread to go in such a state. Moreover, if no cycles are executed by the core, also many other events are not incremented as well (like INSTR_RETIRED_ANY for retired instructions and part of IPC).

lockfile_path option

LIKWID can be configured with a lock file with which the access to the performance monitoring registers can be disabled (only the owner of the lock file is allowed to access the registers). When the lockfile_path option is set, the collector subscribes to changes to this file to stop monitoring if the owner of the lock file changes. This feature is useful when users should be able to perform own hardware performance counter measurements through LIKWID or any other tool.

send_*_total values option

  • send_core_total_values: Metrics, which are usually collected on a per hardware thread basis, are additionally summed up per CPU core.
  • send_socket_total_values Metrics, which are usually collected on a per hardware thread basis, are additionally summed up per CPU socket.
  • send_node_total_values Metrics, which are usually collected on a per hardware thread basis, are additionally summed up per node.

Example configuration

AMD Zen3

  "likwid": {
    "force_overwrite" : false,
    "invalid_to_zero" : false,
    "eventsets": [
      {
        "events": {
          "FIXC1": "ACTUAL_CPU_CLOCK",
          "FIXC2": "MAX_CPU_CLOCK",
          "PMC0": "RETIRED_INSTRUCTIONS",
          "PMC1": "CPU_CLOCKS_UNHALTED",
          "PMC2": "RETIRED_SSE_AVX_FLOPS_ALL",
          "PMC3": "MERGE",
          "DFC0": "DRAM_CHANNEL_0",
          "DFC1": "DRAM_CHANNEL_1",
          "DFC2": "DRAM_CHANNEL_2",
          "DFC3": "DRAM_CHANNEL_3"
        },
        "metrics": [
          {
            "name": "ipc",
            "calc": "PMC0/PMC1",
            "type": "hwthread",
            "publish": true
          },
          {
            "name": "flops_any",
            "calc": "0.000001*PMC2/time",
            "unit": "MFlops/s",
            "type": "hwthread",
            "publish": true
          },
          {
            "name": "clock",
            "calc": "0.000001*(FIXC1/FIXC2)/inverseClock",
            "type": "hwthread",
            "unit": "MHz",
            "publish": true
          },
          {
            "name": "mem1",
            "calc": "0.000001*(DFC0+DFC1+DFC2+DFC3)*64.0/time",
            "unit": "Mbyte/s",
            "type": "socket",
            "publish": false
          }
        ]
      },
      {
        "events": {
          "DFC0": "DRAM_CHANNEL_4",
          "DFC1": "DRAM_CHANNEL_5",
          "DFC2": "DRAM_CHANNEL_6",
          "DFC3": "DRAM_CHANNEL_7",
          "PWR0": "RAPL_CORE_ENERGY",
          "PWR1": "RAPL_PKG_ENERGY"
        },
        "metrics": [
          {
            "name": "pwr_core",
            "calc": "PWR0/time",
            "unit": "Watt"
            "type": "socket",
            "publish": true
          },
          {
            "name": "pwr_pkg",
            "calc": "PWR1/time",
            "type": "socket",
            "unit": "Watt"
            "publish": true
          },
          {
            "name": "mem2",
            "calc": "0.000001*(DFC0+DFC1+DFC2+DFC3)*64.0/time",
            "unit": "Mbyte/s",
            "type": "socket",
            "publish": false
          }
        ]
      }
    ],
    "globalmetrics": [
      {
        "name": "mem_bw",
        "calc": "mem1+mem2",
        "type": "socket",
        "unit": "Mbyte/s",
        "publish": true
      }
    ]
  }

How to get the eventsets and metrics from LIKWID

The likwid collector reads hardware performance counters at a hwthread and socket level. The configuration looks quite complicated but it is basically copy&paste from LIKWID’s performance groups. The collector made multiple iterations and tried to use the performance groups but it lacked flexibility. The current way of configuration provides most flexibility.

The logic is as following: There are multiple eventsets, each consisting of a list of counters+events and a list of metrics. If you compare a common performance group with the example setting above, there is not much difference:

EVENTSET                         ->   "events": {
FIXC1 ACTUAL_CPU_CLOCK           ->     "FIXC1": "ACTUAL_CPU_CLOCK",
FIXC2 MAX_CPU_CLOCK              ->     "FIXC2": "MAX_CPU_CLOCK",
PMC0  RETIRED_INSTRUCTIONS       ->     "PMC0" : "RETIRED_INSTRUCTIONS",
PMC1  CPU_CLOCKS_UNHALTED        ->     "PMC1" : "CPU_CLOCKS_UNHALTED",
PMC2  RETIRED_SSE_AVX_FLOPS_ALL  ->     "PMC2": "RETIRED_SSE_AVX_FLOPS_ALL",
PMC3  MERGE                      ->     "PMC3": "MERGE",
                                 ->   }

The metrics are following the same procedure:

METRICS                          ->   "metrics": [
IPC   PMC0/PMC1                  ->     {
                                 ->       "name" : "IPC",
                                 ->       "calc" : "PMC0/PMC1",
                                 ->       "type": "hwthread",
                                 ->       "publish": true
                                 ->     }
                                 ->   ]

The script scripts/likwid_perfgroup_to_cc_config.py might help you.

4.1.13 - loadavg collector

Toplevel loadavgMetric

loadavg collector

  "loadavg": {
    "exclude_metrics": [
      "proc_run"
    ]
  }

The loadavg collector reads data from /proc/loadavg and outputs a handful node metrics. If a metric is not required, it can be excluded from forwarding it to the sink.

Metrics:

  • load_one
  • load_five
  • load_fifteen
  • proc_run
  • proc_total

4.1.14 - lustrestat collector

Toplevel lustreMetric

lustrestat collector

  "lustrestat": {
    "lctl_command": "/path/to/lctl",
    "exclude_metrics": [
      "setattr",
      "getattr"
    ],
    "send_abs_values" : true,
    "send_derived_values" : true,
    "send_diff_values": true,
    "use_sudo": false
  }

The lustrestat collector uses the lctl application with the get_param option to get all llite metrics (Lustre client). The llite metrics are only available for root users. If password-less sudo is configured, you can enable sudo in the configuration.

Metrics:

  • lustre_read_bytes (unit bytes)
  • lustre_read_requests (unit requests)
  • lustre_write_bytes (unit bytes)
  • lustre_write_requests (unit requests)
  • lustre_open
  • lustre_close
  • lustre_getattr
  • lustre_setattr
  • lustre_statfs
  • lustre_inode_permission
  • lustre_read_bw (if send_derived_values == true, unit bytes/sec)
  • lustre_write_bw (if send_derived_values == true, unit bytes/sec)
  • lustre_read_requests_rate (if send_derived_values == true, unit requests/sec)
  • lustre_write_requests_rate (if send_derived_values == true, unit requests/sec)
  • lustre_read_bytes_diff (if send_diff_values == true, unit bytes)
  • lustre_read_requests_diff (if send_diff_values == true, unit requests)
  • lustre_write_bytes_diff (if send_diff_values == true, unit bytes)
  • lustre_write_requests_diff (if send_diff_values == true, unit requests)
  • lustre_open_diff (if send_diff_values == true)
  • lustre_close_diff (if send_diff_values == true)
  • lustre_getattr_diff (if send_diff_values == true)
  • lustre_setattr_diff (if send_diff_values == true)
  • lustre_statfs_diff (if send_diff_values == true)
  • lustre_inode_permission_diff (if send_diff_values == true)

This collector adds an device tag.

4.1.15 - memstat collector

Toplevel memstatMetric

memstat collector

  "memstat": {
    "exclude_metrics": [
      "mem_used"
    ]
  }

The memstat collector reads data from /proc/meminfo and outputs a handful node metrics. If a metric is not required, it can be excluded from forwarding it to the sink.

Metrics:

  • mem_total
  • mem_sreclaimable
  • mem_slab
  • mem_free
  • mem_buffers
  • mem_cached
  • mem_available
  • mem_shared
  • swap_total
  • swap_free
  • mem_used = mem_total - (mem_free + mem_buffers + mem_cached)

4.1.16 - netstat collector

Toplevel netstatMetric

netstat collector

  "netstat": {
    "include_devices": [
      "eth0"
    ],
    "send_abs_values" : true,
    "send_derived_values" : true
  }

The netstat collector reads data from /proc/net/dev and outputs a handful node metrics. With the include_devices list you can specify which network devices should be measured. Note: Most other collectors use an exclude list instead of an include list.

Metrics:

  • net_bytes_in (unit=bytes)
  • net_bytes_out (unit=bytes)
  • net_pkts_in (unit=packets)
  • net_pkts_out (unit=packets)
  • net_bytes_in_bw (unit=bytes/sec if send_derived_values == true)
  • net_bytes_out_bw (unit=bytes/sec if send_derived_values == true)
  • net_pkts_in_bw (unit=packets/sec if send_derived_values == true)
  • net_pkts_out_bw (unit=packets/sec if send_derived_values == true)

The device name is added as tag stype=network,stype-id=<device>.

4.1.17 - nfs3stat collector

Toplevel nfs3Metric

nfs3stat collector

  "nfs3stat": {
    "nfsstat" : "/path/to/nfsstat",
    "exclude_metrics": [
      "nfs3_total"
    ]
  }

The nfs3stat collector reads data from nfsstat command and outputs a handful node metrics. If a metric is not required, it can be excluded from forwarding it to the sink. There is currently no possibility to get the metrics per mount point.

Metrics:

  • nfs3_total
  • nfs3_null
  • nfs3_getattr
  • nfs3_setattr
  • nfs3_lookup
  • nfs3_access
  • nfs3_readlink
  • nfs3_read
  • nfs3_write
  • nfs3_create
  • nfs3_mkdir
  • nfs3_symlink
  • nfs3_remove
  • nfs3_rmdir
  • nfs3_rename
  • nfs3_link
  • nfs3_readdir
  • nfs3_readdirplus
  • nfs3_fsstat
  • nfs3_fsinfo
  • nfs3_pathconf
  • nfs3_commit

4.1.18 - nfs4stat collector

Toplevel nfs4Metric

nfs4stat collector

  "nfs4stat": {
    "nfsstat" : "/path/to/nfsstat",
    "exclude_metrics": [
      "nfs4_total"
    ]
  }

The nfs4stat collector reads data from nfsstat command and outputs a handful node metrics. If a metric is not required, it can be excluded from forwarding it to the sink. There is currently no possibility to get the metrics per mount point.

Metrics:

  • nfs4_total
  • nfs4_null
  • nfs4_read
  • nfs4_write
  • nfs4_commit
  • nfs4_open
  • nfs4_open_conf
  • nfs4_open_noat
  • nfs4_open_dgrd
  • nfs4_close
  • nfs4_setattr
  • nfs4_fsinfo
  • nfs4_renew
  • nfs4_setclntid
  • nfs4_confirm
  • nfs4_lock
  • nfs4_lockt
  • nfs4_locku
  • nfs4_access
  • nfs4_getattr
  • nfs4_lookup
  • nfs4_lookup_root
  • nfs4_remove
  • nfs4_rename
  • nfs4_link
  • nfs4_symlink
  • nfs4_create
  • nfs4_pathconf
  • nfs4_statfs
  • nfs4_readlink
  • nfs4_readdir
  • nfs4_server_caps
  • nfs4_delegreturn
  • nfs4_getacl
  • nfs4_setacl
  • nfs4_rel_lkowner
  • nfs4_exchange_id
  • nfs4_create_session
  • nfs4_destroy_session
  • nfs4_sequence
  • nfs4_get_lease_time
  • nfs4_reclaim_comp
  • nfs4_secinfo_no
  • nfs4_bind_conn_to_ses

4.1.19 - nfsiostat collector

Toplevel nfsiostatMetric

nfsiostat collector

  "nfsiostat": {
    "exclude_metrics": [
      "nfsio_oread"
    ],
    "exclude_filesystems" : [
        "/mnt",
    ],
    "use_server_as_stype": false
  }

The nfsiostat collector reads data from /proc/self/mountstats and outputs a handful node metrics for each NFS filesystem. If a metric or filesystem is not required, it can be excluded from forwarding it to the sink.

Metrics:

  • nfsio_nread: Bytes transferred by normal read() calls
  • nfsio_nwrite: Bytes transferred by normal write() calls
  • nfsio_oread: Bytes transferred by read() calls with O_DIRECT
  • nfsio_owrite: Bytes transferred by write() calls with O_DIRECT
  • nfsio_pageread: Pages transferred by read() calls
  • nfsio_pagewrite: Pages transferred by write() calls
  • nfsio_nfsread: Bytes transferred for reading from the server
  • nfsio_nfswrite: Pages transferred by writing to the server

The nfsiostat collector adds the mountpoint to the tags as stype=filesystem,stype-id=<mountpoint>. If the server address should be used instead of the mountpoint, use the use_server_as_stype config setting.

4.1.20 - numastat collector

Toplevel numastatsMetric

numastat collector

  "numastats": {}

The numastat collector reads data from /sys/devices/system/node/node*/numastat and outputs a handful memoryDomain metrics. See: https://www.kernel.org/doc/html/latest/admin-guide/numastat.html

Metrics:

  • numastats_numa_hit: A process wanted to allocate memory from this node, and succeeded.
  • numastats_numa_miss: A process wanted to allocate memory from another node, but ended up with memory from this node.
  • numastats_numa_foreign: A process wanted to allocate on this node, but ended up with memory from another node.
  • numastats_local_node: A process ran on this node’s CPU, and got memory from this node.
  • numastats_other_node: A process ran on a different node’s CPU, and got memory from this node.
  • numastats_interleave_hit: Interleaving wanted to allocate from this node and succeeded.

4.1.21 - nvidia collector

Toplevel nvidiaMetric

nvidia collector

  "nvidia": {
    "exclude_devices": [
      "0","1", "0000000:ff:01.0"
    ],
    "exclude_metrics": [
      "nv_fb_mem_used",
      "nv_fan"
    ],
    "process_mig_devices": false,
    "use_pci_info_as_type_id": true,
    "add_pci_info_tag": false,
    "add_uuid_meta": false,
    "add_board_number_meta": false,
    "add_serial_meta": false,
    "use_uuid_for_mig_device": false,
    "use_slice_for_mig_device": false
  }

The nvidia collector can be configured to leave out specific devices with the exclude_devices option. It takes IDs as supplied to the NVML with nvmlDeviceGetHandleByIndex() or the PCI address in NVML format (%08X:%02X:%02X.0). Metrics (listed below) that should not be sent to the MetricRouter can be excluded with the exclude_metrics option. Commonly only the physical GPUs are monitored. If MIG devices should be analyzed as well, set process_mig_devices (adds stype=mig,stype-id=<mig_index>). With the options use_uuid_for_mig_device and use_slice_for_mig_device, the <mig_index> can be replaced with the UUID (e.g. MIG-6a9f7cc8-6d5b-5ce0-92de-750edc4d8849) or the MIG slice name (e.g. 1g.5gb).

The metrics sent by the nvidia collector use accelerator as type tag. For the type-id, it uses the device handle index by default. With the use_pci_info_as_type_id option, the PCI ID is used instead. If both values should be added as tags, activate the add_pci_info_tag option. It uses the device handle index as type-id and adds the PCI ID as separate pci_identifier tag.

Optionally, it is possible to add the UUID, the board part number and the serial to the meta informations. They are not sent to the sinks (if not configured otherwise).

Metrics:

  • nv_util
  • nv_mem_util
  • nv_fb_mem_total
  • nv_fb_mem_used
  • nv_bar1_mem_total
  • nv_bar1_mem_used
  • nv_temp
  • nv_fan
  • nv_ecc_mode
  • nv_perf_state
  • nv_power_usage
  • nv_graphics_clock
  • nv_sm_clock
  • nv_mem_clock
  • nv_video_clock
  • nv_max_graphics_clock
  • nv_max_sm_clock
  • nv_max_mem_clock
  • nv_max_video_clock
  • nv_ecc_uncorrected_error
  • nv_ecc_corrected_error
  • nv_power_max_limit
  • nv_encoder_util
  • nv_decoder_util
  • nv_remapped_rows_corrected
  • nv_remapped_rows_uncorrected
  • nv_remapped_rows_pending
  • nv_remapped_rows_failure
  • nv_compute_processes
  • nv_graphics_processes
  • nv_violation_power
  • nv_violation_thermal
  • nv_violation_sync_boost
  • nv_violation_board_limit
  • nv_violation_low_util
  • nv_violation_reliability
  • nv_violation_below_app_clock
  • nv_violation_below_base_clock
  • nv_nvlink_crc_flit_errors
  • nv_nvlink_crc_errors
  • nv_nvlink_ecc_errors
  • nv_nvlink_replay_errors
  • nv_nvlink_recovery_errors

Some metrics add the additional sub type tag (stype) like the nv_nvlink_* metrics set stype=nvlink,stype-id=<link_number>.

4.1.22 - rapl collector

Toplevel raplMetric

rapl collector

This collector reads running average power limit (RAPL) monitoring attributes to compute average power consumption metrics. See https://www.kernel.org/doc/html/latest/power/powercap/powercap.html#monitoring-attributes.

The Likwid metric collector provides similar functionality.

  "rapl": {
    "exclude_device_by_id": ["0:1", "0:2"],
    "exclude_device_by_name": ["psys"]
  }

Metrics:

  • rapl_average_power: average power consumption in Watt. The average is computed over the entire runtime from the last measurement to the current measurement

4.1.23 - rocm_smi collector

Toplevel rocmsmiMetric

rocm_smi collector

  "rocm_smi": {
    "exclude_devices": [
      "0","1", "0000000:ff:01.0"
    ],
    "exclude_metrics": [
      "rocm_mm_util",
      "rocm_temp_vrsoc"
    ],
    "use_pci_info_as_type_id": true,
    "add_pci_info_tag": false,
    "add_serial_meta": false,
  }

The rocm_smi collector can be configured to leave out specific devices with the exclude_devices option. It takes logical IDs in the list of available devices or the PCI address similar to NVML format (%08X:%02X:%02X.0). Metrics (listed below) that should not be sent to the MetricRouter can be excluded with the exclude_metrics option.

The metrics sent by the rocm_smi collector use accelerator as type tag. For the type-id, it uses the device handle index by default. With the use_pci_info_as_type_id option, the PCI ID is used instead. If both values should be added as tags, activate the add_pci_info_tag option. It uses the device handle index as type-id and adds the PCI ID as separate pci_identifier tag.

Optionally, it is possible to add the serial to the meta informations. They are not sent to the sinks (if not configured otherwise).

Metrics:

  • rocm_gfx_util
  • rocm_umc_util
  • rocm_mm_util
  • rocm_avg_power
  • rocm_temp_mem
  • rocm_temp_hotspot
  • rocm_temp_edge
  • rocm_temp_vrgfx
  • rocm_temp_vrsoc
  • rocm_temp_vrmem
  • rocm_gfx_clock
  • rocm_soc_clock
  • rocm_u_clock
  • rocm_v0_clock
  • rocm_v1_clock
  • rocm_d0_clock
  • rocm_d1_clock
  • rocm_temp_hbm

Some metrics add the additional sub type tag (stype) like the rocm_temp_hbm metrics set stype=device,stype-id=<HBM_slice_number>.

4.1.24 - schedstat collector

Toplevel schedstatMetric

schedstat collector

  "schedstat": {
  }

The schedstat collector reads data from /proc/schedstat and calculates a load value, separated by hwthread. This might be useful to detect bad cpu pinning on shared nodes etc.

Metric:

  • cpu_load_core

4.1.25 - self collector

Toplevel selfMetric

self collector

  "self": {
    "read_mem_stats" : true,
    "read_goroutines" : true,
    "read_cgo_calls" : true,
    "read_rusage" : true
  }

The self collector reads the data from the runtime and syscall packages, so monitors the execution of the cc-metric-collector itself.

Metrics:

  • If read_mem_stats == true:
    • total_alloc: The metric reports cumulative bytes allocated for heap objects.
    • heap_alloc: The metric reports bytes of allocated heap objects.
    • heap_sys: The metric reports bytes of heap memory obtained from the OS.
    • heap_idle: The metric reports bytes in idle (unused) spans.
    • heap_inuse: The metric reports bytes in in-use spans.
    • heap_released: The metric reports bytes of physical memory returned to the OS.
    • heap_objects: The metric reports the number of allocated heap objects.
  • If read_goroutines == true:
    • num_goroutines: The metric reports the number of goroutines that currently exist.
  • If read_cgo_calls == true:
    • num_cgo_calls: The metric reports the number of cgo calls made by the current process.
  • If read_rusage == true:
    • rusage_user_time: The metric reports the amount of time that this process has been scheduled in user mode.
    • rusage_system_time: The metric reports the amount of time that this process has been scheduled in kernel mode.
    • rusage_vol_ctx_switch: The metric reports the amount of voluntary context switches.
    • rusage_invol_ctx_switch: The metric reports the amount of involuntary context switches.
    • rusage_signals: The metric reports the number of signals received.
    • rusage_major_pgfaults: The metric reports the number of major faults the process has made which have required loading a memory page from disk.
    • rusage_minor_pgfaults: The metric reports the number of minor faults the process has made which have not required loading a memory page from disk.

4.1.26 - tempstat collector

Toplevel tempMetric

tempstat collector

  "tempstat": {
    "tag_override" : {
        "<device like hwmon1>" : {
            "type" : "socket",
            "type-id" : "0"
        }
    },
    "exclude_metrics": [
      "metric1",
      "metric2"
    ]
  }

The tempstat collector reads the data from /sys/class/hwmon/<device>/tempX_{input,label}

Metrics:

  • temp_*: The metric name is taken from the label files.

4.1.27 - topprocs collector

Toplevel topprocsMetric

topprocs collector

  "topprocs": {
    "num_procs": 5
  }

The topprocs collector reads the TopX processes (sorted by CPU utilization, ps -Ao comm --sort=-pcpu).

In contrast to most other collectors, the metric value is a string.

4.2 - cc-metric-collector's message processor

Documentation of cc-metric-collector’s message processor

Message Processor Component

Multiple parts of in the ClusterCockit ecosystem require the processing of CCMessages. The main CC application using it is cc-metric-collector. The processing part there was originally in the metric router, the central hub connecting collectors (reading local data), receivers (receiving remote data) and sinks (sending data). Already in early stages, the lack of flexibility caused some trouble:

The sysadmins wanted to keep operating their Ganglia based monitoring infrastructure while we developed the CC stack. Ganglia wants the core metrics with a specific name and resolution (right unit prefix) but there was no conversion of the data in the CC stack, so CC frontend developers wanted a different resolution for some metrics. The issue was basically the mem_used metric showing the currently used memory of the node. Ganglia wants it in kByte as provided by the Linux operating system but CC wanted it in GByte.

With the message processor, the Ganglia sinks can apply the unit prefix changes individually and name the metrics as required by Ganglia.

For developers

Whenever you receive or are about to send a message out, you should provide some processing.

Configuration of component

New operations can be added to the message processor at runtime. Of course, they can also be removed again. For the initial setup, having a configuration file or some fields in a configuration file for the processing.

The message processor uses the following configuration

{
	"drop_messages": [
		"name_of_message_to_drop"
	],
	"drop_messages_if": [
		"condition_when_to_drop_message",
		"name == 'drop_this'",
		"tag.hostname == 'this_host'",
		"meta.unit != 'MB'"
	],
	"rename_messages" : {
		"old_message_name" : "new_message_name"
	},
	"rename_messages_if": {
		"condition_when_to_rename_message" : "new_name"
	},
	"add_tags_if": [
		{
			"if" : "condition_when_to_add_tag",
			"key": "name_for_new_tag",
			"value": "new_tag_value"
		}
	],
	"delete_tags_if": [
		{
			"if" : "condition_when_to_delete_tag",
			"key": "name_of_tag"
		}
	],
	"add_meta_if": [
		{
			"if" : "condition_when_to_add_meta_info",
			"key": "name_for_new_meta_info",
			"value": "new_meta_info_value"
		}
	],
	"delete_meta_if": [
		{
			"if" : "condition_when_to_delete_meta_info",
			"key": "name_of_meta_info"
		}
	],
	"add_field_if": [
		{
			"if" : "condition_when_to_add_field",
			"key": "name_for_new_field",
			"value": "new_field_value_but_only_string_at_the_moment"
		}
	],
	"delete_field_if": [
		{
			"if" : "condition_when_to_delete_field",
			"key": "name_of_field"
		}
	],
	"move_tag_to_meta_if": [
		{
			"if" : "condition_when_to_move_tag_to_meta_info_including_its_value",
			"key": "name_of_tag",
			"value": "name_of_meta_info"
		}
	],
	"move_tag_to_field_if": [
		{
			"if" : "condition_when_to_move_tag_to_fields_including_its_value",
			"key": "name_of_tag",
			"value": "name_of_field"
		}
	],
	"move_meta_to_tag_if": [
		{
			"if" : "condition_when_to_move_meta_info_to_tags_including_its_value",
			"key": "name_of_meta_info",
			"value": "name_of_tag"
		}
	],
	"move_meta_to_field_if": [
		{
			"if" : "condition_when_to_move_meta_info_to_fields_including_its_value",
			"key": "name_of_tag",
			"value": "name_of_meta_info"
		}
	],
	"move_field_to_tag_if": [
		{
			"if" : "condition_when_to_move_field_to_tags_including_its_stringified_value",
			"key": "name_of_field",
			"value": "name_of_tag"
		}
	],
	"move_field_to_meta_if": [
		{
			"if" : "condition_when_to_move_field_to_meta_info_including_its_stringified_value",
			"key": "name_of_field",
			"value": "name_of_meta_info"
		}
	],
	"drop_by_message_type": [
		"metric",
		"event",
		"log",
		"control"
	],
	"change_unit_prefix": {
		"name == 'metric_with_wrong_unit_prefix'" : "G",
		"only_if_messagetype == 'metric'": "T"
	},
	"normalize_units": true,
	"add_base_env": {
		"MY_CONSTANT_FOR_CUSTOM_CONDITIONS": 1.0,
		"output_value_for_test_metrics": 42.0,
	},
	"stage_order": [
		"rename_messages_if",
		"drop_messages"
	]
}

The options change_unit_prefix and normalize_units are only applied to CCMetrics. It is not possible to delete the field related to each message type as defined in cc-specification. In short:

  • CCMetrics always have to have a field named value
  • CCEvents always have to have a field named event
  • CCLogs always have to have a field named log
  • CCControl messages always have to have a field named control

With add_base_env, one can specifiy mykey=myvalue pairs that can be used in conditions like tag.type == mykey.

The order in which each message is processed, can be specified with the stage_order option. The stage names are the keys in the JSON configuration, thus change_unit_prefix, move_field_to_meta_if, etc. Stages can be listed multiple times.

Using the component

In order to load the configuration from a json.RawMessage:

mp, err := NewMessageProcessor()
if err != nil {
	log.Error("failed to create new message processor")
}
mp.FromConfigJSON(configJson)

After initialization and adding the different operations, the ProcessMessage() function applies all operations and returns whether the message should be dropped.

m := lp.CCMetric{}

x, err := mp.ProcessMessage(m)
if err != nil {
	// handle error
}
if x != nil {
    // process x further
} else {
	// this message got dropped
}

Single operations can be added and removed at runtime

type MessageProcessor interface {
	// Functions to set the execution order of the processing stages
	SetStages([]string) error
	DefaultStages() []string
	// Function to add variables to the base evaluation environment
	AddBaseEnv(env map[string]interface{}) error
	// Functions to add and remove rules
	AddDropMessagesByName(name string) error
	RemoveDropMessagesByName(name string)
	AddDropMessagesByCondition(condition string) error
	RemoveDropMessagesByCondition(condition string)
	AddRenameMetricByCondition(condition string, name string) error
	RemoveRenameMetricByCondition(condition string)
	AddRenameMetricByName(from, to string) error
	RemoveRenameMetricByName(from string)
	SetNormalizeUnits(settings bool)
	AddChangeUnitPrefix(condition string, prefix string) error
	RemoveChangeUnitPrefix(condition string)
	AddAddTagsByCondition(condition, key, value string) error
	RemoveAddTagsByCondition(condition string)
	AddDeleteTagsByCondition(condition, key, value string) error
	RemoveDeleteTagsByCondition(condition string)
	AddAddMetaByCondition(condition, key, value string) error
	RemoveAddMetaByCondition(condition string)
	AddDeleteMetaByCondition(condition, key, value string) error
	RemoveDeleteMetaByCondition(condition string)
	AddMoveTagToMeta(condition, key, value string) error
	RemoveMoveTagToMeta(condition string)
	AddMoveTagToFields(condition, key, value string) error
	RemoveMoveTagToFields(condition string)
	AddMoveMetaToTags(condition, key, value string) error
	RemoveMoveMetaToTags(condition string)
	AddMoveMetaToFields(condition, key, value string) error
	RemoveMoveMetaToFields(condition string)
	AddMoveFieldToTags(condition, key, value string) error
	RemoveMoveFieldToTags(condition string)
	AddMoveFieldToMeta(condition, key, value string) error
	RemoveMoveFieldToMeta(condition string)
	// Read in a JSON configuration
	FromConfigJSON(config json.RawMessage) error
	ProcessMessage(m lp2.CCMessage) (lp2.CCMessage, error)
	// Processing functions for legacy CCMetric and current CCMessage
	ProcessMetric(m lp.CCMetric) (lp2.CCMessage, error)
}

Syntax for evaluatable terms

The message processor uses gval for evaluating the terms. It provides a basic set of operators like string comparison and arithmetic operations.

Accessible for operations are

  • name of the message
  • timestamp or time of the message
  • type, type-id of the message (also tag_type, tag_type-id and tag_typeid)
  • stype, stype-id of the message (if message has theses tags, also tag_stype, tag_stype-id and tag_stypeid)
  • value for a CCMetric message (also field_value)
  • event for a CCEvent message (also field_event)
  • control for a CCControl message (also field_control)
  • log for a CCLog message (also field_log)
  • messagetype or msgtype. Possible values event, metric, log and control.

Generally, all tags are accessible with tag_<tagkey>, tags_<tagkey> or tags.<tagkey>. Similarly for all fields with field[s]?[_.]<fieldkey>. For meta information meta[_.]<metakey> (there is no metas[_.]<metakey>).

The syntax of expr is accepted with some additions:

  • Comparing strings: ==, !=, str matches regex (use % instead of \!)
  • Combining conditions: &&, ||
  • Comparing numbers: ==, !=, <, >, <=, >=
  • Test lists: <value> in <list>
  • Topological tests: tag_type-id in getCpuListOfType("socket", "1") (test if the metric belongs to socket 1 in local node topology)

Often the operations are written in JSON files for loading them at startup. In JSON, some characters are not allowed. Therefore, the term syntax reflects that:

  • use '' instead of "" for strings
  • for the regexes, use % instead of \

For operations that should be applied on all messages, use the condition true.

Overhead

The operations taking conditions are pre-processed, which is commonly the time consuming part but, of course, with each added operation, the time to process a message increases. Moreover, the processing creates a copy of the message.

4.3 - cc-metric-collector's receivers

Documentation of cc-metric-collector’s receivers

CCMetric receivers

This folder contains the ReceiveManager and receiver implementations for the cc-metric-collector.

Configuration

The configuration file for the receivers is a list of configurations. The type field in each specifies which receiver to initialize.

{
  "myreceivername" : {
    "type": "receiver-type",
    <receiver-specific configuration>
  }
}

This allows to specify

Available receivers

  • nats: Receive metrics from the NATS network
  • prometheus: Scrape data from a Prometheus client
  • http: Listen for HTTP Post requests transporting metrics in InfluxDB line protocol
  • ipmi: Read IPMI sensor readings
  • redfish Use the Redfish (specification) to query thermal and power metrics

Contributing own receivers

A receiver contains a few functions and is derived from the type Receiver (in metricReceiver.go):

For an example, check the sample receiver

4.3.1 - http receiver

Toplevel httpReceiver

http receiver

The http receiver can be used receive metrics through HTTP POST requests.

Configuration structure

{
  "<name>": {
    "type": "http",
    "address" : "",
    "port" : "8080",
    "path" : "/write",
    "idle_timeout": "120s",
    "username": "myUser",
    "password": "myPW"
  }
}
  • type: makes the receiver a http receiver
  • address: Listen address
  • port: Listen port
  • path: URL path for the write endpoint
  • idle_timeout: Maximum amount of time to wait for the next request when keep-alives are enabled should be larger than the measurement interval to keep the connection open
  • keep_alives_enabled: Controls whether HTTP keep-alives are enabled. By default, keep-alives are enabled.
  • username: username for basic authentication
  • password: password for basic authentication

The HTTP endpoint listens to http://<address>:<port>/<path>

Debugging

  • Install curl

  • Use curl to send message to http receiver

    curl http://localhost:8080/write \
    --user "myUser:myPW" \
    --data \
    "myMetric,hostname=myHost,type=hwthread,type-id=0,unit=Hz value=400000i 1694777161164284635
    myMetric,hostname=myHost,type=hwthread,type-id=1,unit=Hz value=400001i 1694777161164284635"
    

4.3.2 - IPMI Receiver

Toplevel ipmiReceiver

IPMI Receiver

The IPMI Receiver uses ipmi-sensors from the FreeIPMI project to read IPMI sensor readings and sensor data repository (SDR) information. The available metrics depend on the sensors provided by the hardware vendor but typically contain temperature, fan speed, voltage and power metrics.

Configuration structure

{
    "<IPMI receiver name>": {
        "type": "ipmi",
        "interval": "30s",
        "fanout": 256,
        "username": "<Username>",
        "password": "<Password>",
        "endpoint": "ipmi-sensors://%h-bmc",
        "exclude_metrics": [ "fan_speed", "voltage" ],
        "client_config": [
            {
                "host_list": "n[1,2-4]"
            },
            {
                "host_list": "n[5-6]",
                "driver_type": "LAN",
                "cli_options": [ "--workaround-flags=..." ],
                "password": "<Password 2>"
            }
        ]
    }
}

Global settings:

  • interval: How often the IPMI sensor metrics should be read and send to the sink (default: 30 s)

Global and per IPMI device settings (per IPMI device settings overwrite the global settings):

  • exclude_metrics: list of excluded metrics e.g. fan_speed, power, temperature, utilization, voltage
  • fanout: Maximum number of simultaneous IPMI connections (default: 64)
  • driver_type: Out of band IPMI driver (default: LAN_2_0)
  • username: User name to authenticate with
  • password: Password to use for authentication
  • endpoint: URL of the IPMI device (placeholder %h gets replaced by the hostname)

Per IPMI device settings:

  • host_list: List of hosts with the same client configuration
  • cli_options: Additional command line options for ipmi-sensors

4.3.3 - nats receiver

Toplevel natsReceiver

nats receiver

The nats receiver can be used receive metrics from the NATS network. The nats receiver subscribes to the topic database and listens on address and port for metrics in the InfluxDB line protocol.

Configuration structure

{
  "<name>": {
    "type": "nats",
    "address" : "nats-server.example.org",
    "port" : "4222",
    "subject" : "subject",
    "user": "natsuser",
    "password": "natssecret",
    "nkey_file": "/path/to/nkey_file"
  }
}
  • type: makes the receiver a nats receiver
  • address: Address of the NATS control server
  • port: Port of the NATS control server
  • subject: Subscribes to this subject and receive metrics
  • user: Connect to nats using this user
  • password: Connect to nats using this password
  • nkey_file: Path to credentials file with NKEY

Debugging

  • Install NATS server and command line client

  • Start NATS server

    nats-server --net nats-server.example.org --port 4222
    
  • Check NATS server works as expected

    nats --server=nats-server-db.example.org:4222 server check
    
  • Use NATS command line client to subscribe to all messages

    nats --server=nats-server-db.example.org:4222 sub ">"
    
  • Use NATS command line client to send message to NATS receiver

    nats --server=nats-server-db.example.org:4222 pub subject \
    "myMetric,hostname=myHost,type=hwthread,type-id=0,unit=Hz value=400000i 1694777161164284635
    myMetric,hostname=myHost,type=hwthread,type-id=1,unit=Hz value=400001i 1694777161164284635"
    

4.3.4 - prometheus receiver

Toplevel prometheusReceiver

prometheus receiver

The prometheus receiver can be used to scrape the metrics of a single prometheus client. It does not use any official Golang library but making simple HTTP get requests and parse the response.

Configuration structure

{
  "<name>": {
    "type": "prometheus",
    "address" : "testpromhost",
    "port" : "12345",
    "path" : "/prometheus",
    "interval": "5s",
    "ssl" : true,
  }
}
  • type: makes the receiver a prometheus receiver
  • address: Hostname or IP of the Prometheus agent
  • port: Port of Prometheus agent
  • path: Path to the Prometheus endpoint
  • interval: Scrape the Prometheus endpoint in this interval (default ‘5s’)
  • ssl: Use SSL or not

The receiver requests data from http(s)://<address>:<port>/<path>.

4.3.5 - Redfish receiver

Toplevel redfishReceiver

Redfish receiver

The Redfish receiver uses the Redfish (specification) to query thermal and power metrics. Thermal metrics may include various fan speeds and temperatures. Power metrics may include the current power consumption of various hardware components. It may also include the minimum, maximum and average power consumption of these components in a given time interval. The receiver will poll each configured redfish device once in a given interval. Multiple devices can be accessed in parallel to increase throughput.

Configuration structure

{
    "<redfish receiver name>": {
        "type": "redfish",
        "username": "<Username>",
        "password": "<Password>",
        "endpoint": "https://%h-bmc",
        "exclude_metrics": [ "min_consumed_watts" ],
        "client_config": [
            {
                "host_list": "n[1,2-4]"
            },
            {
                "host_list": "n5",
                "disable_power_metrics": true,
                "disable_processor_metrics": true,
                "disable_thermal_metrics": true
            },
            {
                "host_list": "n6" ],
                "username": "<Username 2>",
                "password": "<Password 2>",
                "endpoint": "https://%h-BMC",
                "disable_sensor_metrics": true
            }
        ]
    }
}

Global settings:

  • fanout: Maximum number of simultaneous redfish connections (default: 64)
  • interval: How often the redfish power metrics should be read and send to the sink (default: 30 s)
  • http_insecure: Control whether a client verifies the server’s certificate (default: true == do not verify server’s certificate)
  • http_timeout: Time limit for requests made by this HTTP client (default: 10 s)

Global and per redfish device settings (per redfish device settings overwrite the global settings):

  • disable_power_metrics: disable collection of power metrics (/redfish/v1/Chassis/{ChassisId}/Power)
  • disable_processor_metrics: disable collection of processor metrics (/redfish/v1/Systems/{ComputerSystemId}/Processors/{ProcessorId}/ProcessorMetrics)
  • disable_sensors: disable collection of fan, power and thermal sensor metrics (/redfish/v1/Chassis/{ChassisId}/Sensors/{SensorId})
  • disable_thermal_metrics: disable collection of thermal metrics (/redfish/v1/Chassis/{ChassisId}/Thermal)
  • exclude_metrics: list of excluded metrics
  • username: User name to authenticate with
  • password: Password to use for authentication
  • endpoint: URL of the redfish service (placeholder %h gets replaced by the hostname)

Per redfish device settings:

  • host_list: List of hosts with the same client configuration

4.4 - cc-metric-collector's router

Documentation of cc-metric-collector’s router

CC Metric Router

The CCMetric router sits in between the collectors and the sinks and can be used to add and remove tags to/from traversing [CCMessages](https://pkg.go.dev/github.com/ClusterCockpit/cc-energy-manager@v0.0.0-20240919152819-92a17f2da4f7/pkg/cc-message.

Configuration

Note: Use the message processor configuration with option process_messages.

{
    "num_cache_intervals" : 1,
    "interval_timestamp" : true,
    "hostname_tag" : "hostname",
    "max_forward" : 50,
    "process_messages": {
      "see": "pkg/messageProcessor/README.md"
    },
    "add_tags" : [
        {
            "key" : "cluster",
            "value" : "testcluster",
            "if" : "*"
        },
        {
            "key" : "test",
            "value" : "testing",
            "if" : "name == 'temp_package_id_0'"
        }
    ],
    "delete_tags" : [
        {
            "key" : "unit",
            "value" : "*",
            "if" : "*"
        }
    ],
    "interval_aggregates" : [
        {
            "name" : "temp_cores_avg",
            "if" : "match('temp_core_%d+', metric.Name())",
            "function" : "avg(values)",
            "tags" : {
                "type" : "node"
            },
            "meta" : {
                "group": "IPMI",
                "unit": "degC",
                "source": "TempCollector"
            }
        }
    ],
    "drop_metrics" : [
        "not_interesting_metric_at_all"
    ],
    "drop_metrics_if" : [
        "match('temp_core_%d+', metric.Name())"
    ],
    "rename_metrics" : {
        "metric_12345" : "mymetric"
    },
    "normalize_units" : true,
    "change_unit_prefix" : {
      "mem_used" : "G",
      "mem_total" : "G"
    }
}

There are three main options add_tags, delete_tags and interval_timestamp. add_tags and delete_tags are lists consisting of dicts with key, value and if. The value can be omitted in the delete_tags part as it only uses the key for removal. The interval_timestamp setting means that a unique timestamp is applied to all metrics traversing the router during an interval.

Note: Use the message processor configuration (option process_messages) instead of add_tags, delete_tags, drop_metrics, drop_metrics_if, rename_metrics, normalize_units and change_unit_prefix. These options are deprecated and will be removed in future versions. Until then, they are added to the message processor.

Processing order in the router

  • Add the hostname_tag tag (if sent by collectors or cache)
  • If interval_timestamp == true, change time of metrics
  • Check if metric should be dropped (drop_metrics and drop_metrics_if)
  • Add tags from add_tags
  • Delete tags from del_tags
  • Rename metric based on rename_metrics and store old name as oldname in meta information
  • Add tags from add_tags (if you used the new name in the if condition)
  • Delete tags from del_tags (if you used the new name in the if condition)
  • Send to sinks
  • Move to cache (if num_cache_intervals > 0)

The interval_timestamp option

The collectors’ Read() functions are not called simultaneously and therefore the metrics gathered in an interval can have different timestamps. If you want to avoid that and have a common timestamp (the beginning of the interval), set this option to true and the MetricRouter sets the time.

The num_cache_intervals option

If the MetricRouter should buffer metrics of intervals in a MetricCache, this option specifies the number of past intervals that should be kept. If num_cache_intervals = 0, the cache is disabled. With num_cache_intervals = 1, only the metrics of the last interval are buffered.

A num_cache_intervals > 0 is required to use the interval_aggregates option.

The hostname_tag option

By default, the router tags metrics with the hostname for all locally created metrics. The default tag name is hostname, but it can be changed if your organization wants anything else

The max_forward option

Every time the router receives a metric through any of the channels, it tries to directly read up to max_forward metrics from the same channel. This was done as the router thread would go to sleep and wake up with every arriving metric. The default are 50 metrics at once and max_forward needs to greater than 1.

The rename_metrics option

deprecated

In the ClusterCockpit world we specified a set of standard metrics. Since some collectors determine the metric names based on files, execuables and libraries, they might change from system to system (or installation to installtion, OS to OS, …). In order to get the common names, you can rename incoming metrics before sending them to the sink. If the metric name matches the oldname, it is changed to newname

{
  "oldname" : "newname",
  "clock_mhz" : "clock"
}

Conditional manipulation of tags (add_tags and del_tags)

deprecated

Common config format:

{
    "key" : "test",
    "value" : "testing",
    "if" : "name == 'temp_package_id_0'"
}

The del_tags option

deprecated

The collectors are free to add whatever key=value pair to the metric tags (although the usage of tags should be minimized). If you want to delete a tag afterwards, you can do that. When the if condition matches on a metric, the key is removed from the metric’s tags.

If you want to remove a tag for all metrics, use the condition wildcard *. The value field can be omitted in the del_tags case.

Never delete tags:

  • hostname
  • type
  • type-id

The add_tags option

deprecated

In some cases, metrics should be tagged or an existing tag changed based on some condition. This can be done in the add_tags section. When the if condition evaluates to true, the tag key is added or gets changed to the new value.

If the CCMetric name is equal to temp_package_id_0, it adds an additional tag test=testing to the metric.

For this metric, a more useful example would be:

[
  {
    "key" : "type",
    "value" : "socket",
    "if" : "name == 'temp_package_id_0'"
  },
  {
    "key" : "type-id",
    "value" : "0",
    "if" : "name == 'temp_package_id_0'"
  },
]

The metric temp_package_id_0 corresponds to the tempature of the first CPU socket (=package). With the above configuration, the tags would reflect that because commonly the TempCollector submits only node metrics.

In order to match all metrics, you can use *, so in order to add a flag per default. This is useful to attached system-specific tags like cluster=testcluster:

{
    "key" : "cluster",
    "value" : "testcluster",
    "if" : "*"
}

Dropping metrics

In some cases, you want to drop a metric and don’t get it forwarded to the sinks. There are two options based on the required specification:

  • Based only on the metric name -> drop_metrics section
  • An evaluable condition with more overhead -> drop_metrics_if section

The drop_metrics section

deprecated

The argument is a list of metric names. No futher checks are performed, only a comparison of the metric name

{
  "drop_metrics" : [
      "drop_metric_1",
      "drop_metric_2"
  ]
}

The example drops all metrics with the name drop_metric_1 and drop_metric_2.

The drop_metrics_if section

deprecated

This option takes a list of evaluable conditions and performs them one after the other on all metrics incoming from the collectors and the metric cache (aka interval_aggregates).

{
  "drop_metrics_if" : [
      "match('drop_metric_%d+', name)",
      "match('cpu', type) && type-id == 0"
  ]
}

The first line is comparable with the example in drop_metrics, it drops all metrics starting with drop_metric_ and ending with a number. The second line drops all metrics of the first hardware thread (not recommended)

Manipulating the metric units

The normalize_units option

deprecated

The cc-metric-collector tries to read the data from the system as it is reported. If available, it tries to read the metric unit from the system as well (e.g. from /proc/meminfo). The problem is that, depending on the source, the metric units are named differently. Just think about byte, Byte, B, bytes, … The cc-units package provides us a normalization option to use the same metric unit name for all metrics. It this option is set to true, all unit meta tags are normalized.

The change_unit_prefix section

deprecated

It is often the case that metrics are reported by the system using a rather outdated unit prefix (like /proc/meminfo still uses kByte despite current memory sizes are in the GByte range). If you want to change the prefix of a unit, you can do that with the help of cc-units. The setting works on the metric name and requires the new prefix for the metric. The cc-units package determines the scaling factor.

Aggregate metric values of the current interval with the interval_aggregates option

Note: interval_aggregates works only if num_cache_intervals > 0 and is experimental

In some cases, you need to derive new metrics based on the metrics arriving during an interval. This can be done in the interval_aggregates section. The logic is similar to the other metric manipulation and filtering options. A cache stores all metrics that arrive during an interval. At the beginning of the next interval, the list of metrics is submitted to the MetricAggregator. It derives new metrics and submits them back to the MetricRouter, so they are sent in the next interval but have the timestamp of the previous interval beginning.

"interval_aggregates" : [
  {
    "name" : "new_metric_name",
    "if" : "match('sub_metric_%d+', metric.Name())",
    "function" : "avg(values)",
    "tags" : {
      "key" : "value",
      "type" : "node"
    },
    "meta" : {
      "key" : "value",
      "group": "IPMI",
      "unit": "<copy>",
    }
  }
]

The above configuration, collects all metric values for metrics evaluating if to true. Afterwards it calculates the average avg of the values (list of all metrics’ field value) and creates a new CCMetric with the name new_metric_name and adds the tags in tags and the meta information in meta. The special value <copy> searches the input metrics and copies the value of the first match of key to the new CCMetric.

If you are not interested in the input metrics sub_metric_%d+ at all, you can add the same condition used here to the drop_metrics_if section to drop them.

Use cases for interval_aggregates:

  • Combine multiple metrics of the a collector to a new one like the MemstatCollector does it for mem_used)):
  {
    "name" : "mem_used",
    "if" : "source == 'MemstatCollector'",
    "function" : "sum(mem_total) - (sum(mem_free) + sum(mem_buffers) + sum(mem_cached))",
    "tags" : {
      "type" : "node"
    },
    "meta" : {
      "group": "<copy>",
      "unit": "<copy>",
      "source": "<copy>"
    }
  }

Order of operations

The router performs the above mentioned options in a specific order. In order to get the logic you want for a specific metric, it is crucial to know the processing order:

  • Add the hostname tag (c)
  • Manipulate the timestamp to the interval timestamp (c,r)
  • Drop metrics based on drop_metrics and drop_metrics_if (c,r)
  • Add tags based on add_tags (c,r)
  • Delete tags based on del_tags (c,r)
  • Rename metric based on rename_metric (c,r)
    • Add tags based on add_tags to still work if the configuration uses the new name (c,r)
    • Delete tags based on del_tags to still work if the configuration uses the new name (c,r)
  • Normalize units when normalize_units is set (c,r)
  • Convert unit prefix based on change_unit_prefix (c,r)

Legend:

  • ‘c’ if metric is coming from a collector
  • ‘r’ if metric is coming from a receiver

4.5 - cc-metric-collector's sinks

Documentation of cc-metric-collector’s sinks

CCMetric sinks

This folder contains the SinkManager and sink implementations for the cc-metric-collector.

Available sinks:

Configuration

The configuration file for the sinks is a list of configurations. The type field in each specifies which sink to initialize.

{
  "mystdout" : {
    "type" : "stdout",
    "meta_as_tags" : [
    	"unit"
    ]
  },
  "metricstore" : {
    "type" : "http",
    "host" : "localhost",
    "port" : "4123",
    "database" : "ccmetric",
    "password" : "<jwt token>"
  }
}

Contributing own sinks

A sink contains five functions and is derived from the type sink:

  • Init(name string, config json.RawMessage) error
  • Write(point CCMetric) error
  • Flush() error
  • Close()
  • New<Typename>(name string, config json.RawMessage) (Sink, error) (calls the Init() function)

The data structures should be set up in Init() like opening a file or server connection. The Write() function writes/sends the data. For non-blocking sinks, the Flush() method tells the sink to drain its internal buffers. The Close() function should tear down anything created in Init().

Finally, the sink needs to be registered in the sinkManager.go. There is a list of sinks called AvailableSinks which is a map (sink_type_string -> pointer to sink interface). Add a new entry with a descriptive name and the new sink.

Sample sink

package sinks

import (
	"encoding/json"
	"log"
	lp "github.com/ClusterCockpit/cc-metric-collector/internal/ccMetric"
)

type SampleSinkConfig struct {
	defaultSinkConfig  // defines JSON tags for 'name' and 'meta_as_tags'
}

type SampleSink struct {
	sink              // declarate 'name' and 'meta_as_tags'
	config StdoutSinkConfig // entry point to the SampleSinkConfig
}

// Initialize the sink by giving it a name and reading in the config JSON
func (s *SampleSink) Init(name string, config json.RawMessage) error {
	s.name = fmt.Sprintf("SampleSink(%s)", name)   // Always specify a name here
  // Read in the config JSON
	if len(config) > 0 {
		err := json.Unmarshal(config, &s.config)
		if err != nil {
			return err
		}
	}
	return nil
}

// Code to submit a single CCMetric to the sink
func (s *SampleSink) Write(point lp.CCMetric) error {
	log.Print(point)
	return nil
}

// If the sink uses batched sends internally, you can tell to flush its buffers
func (s *SampleSink) Flush() error {
	return nil
}


// Close sink: close network connection, close files, close libraries, ...
func (s *SampleSink) Close() {}


// New function to create a new instance of the sink
func NewSampleSink(name string, config json.RawMessage) (Sink, error) {
	s := new(SampleSink)
	err := s.Init(name, config)
	return s, err
}

4.5.1 - ganglia sink

Toplevel gangliaSink

ganglia sink

The ganglia sink uses the gmetric tool of the Ganglia Monitoring System to submit the metrics

Configuration structure

{
  "<name>": {
    "type": "ganglia",
    "gmetric_path" : "/path/to/gmetric",
    "add_ganglia_group" : true,
    "process_messages" : {
      "see" : "docs of message processor for valid fields"
    },
    "meta_as_tags" : []
  }
}
  • type: makes the sink an ganglia sink
  • gmetric_path: Path to gmetric executable (optional). If not given, the sink searches in $PATH for gmetric.
  • add_ganglia_group: Add --group=X based on meta information to the gmetric call. Some old versions of gmetric do not support the --group option.
  • process_messages: Process messages with given rules before progressing or dropping, see here (optional)
  • meta_as_tags: print all meta information as tags in the output (deprecated, optional)

4.5.2 - http sink

Toplevel httpSink

http sink

The http sink uses POST requests to a HTTP server to submit the metrics in the InfluxDB line-protocol format. It uses JSON web tokens for authentification. The sink creates batches of metrics before sending, to reduce the HTTP traffic.

Configuration structure

{
  "<name>": {
    "type": "http",
    "url" : "https://my-monitoring.example.com:1234/api/write",
    "jwt" : "blabla.blabla.blabla",
    "username": "myUser",
    "password": "myPW",
    "timeout": "5s",
    "idle_connection_timeout" : "5s",
    "flush_delay": "2s",
    "batch_size": 1000,
    "precision": "s",
    "process_messages" : {
      "see" : "docs of message processor for valid fields"
    },
    "meta_as_tags" : []
  }
}
  • type: makes the sink an http sink
  • url: The full URL of the endpoint
  • jwt: JSON web tokens for authentication (Using the Bearer scheme)
  • username: username for basic authentication
  • password: password for basic authentication
  • timeout: General timeout for the HTTP client (default ‘5s’)
  • max_retries: Maximum number of retries to connect to the http server
  • idle_connection_timeout: Timeout for idle connections (default ‘120s’). Should be larger than the measurement interval to keep the connection open
  • flush_delay: Batch all writes arriving in during this duration (default ‘1s’, batching can be disabled by setting it to 0)
  • batch_size: Maximal batch size. If batch_size is reached before the end of flush_delay, the metrics are sent without further delay
  • precision: Precision of the timestamp. Valid values are ’s’, ‘ms’, ‘us’ and ’ns’. (default is ’s’)
  • process_messages: Process messages with given rules before progressing or dropping, see here (optional)
  • meta_as_tags: print all meta information as tags in the output (deprecated, optional)

Using http sink for communication with cc-metric-store

The cc-metric-store only accepts metrics with a timestamp precision in seconds, so it is required to use "precision": "s".

4.5.3 - influxasync sink

Toplevel influxAsyncSink

influxasync sink

The influxasync sink uses the official InfluxDB golang client to write the metrics to an InfluxDB database in a non-blocking fashion. It provides only support for V2 write endpoints (InfluxDB 1.8.0 or later).

Configuration structure

{
  "<name>": {
    "type": "influxasync",
    "database" : "mymetrics",
    "host": "dbhost.example.com",
    "port": "4222",
    "user": "exampleuser",
    "password" : "examplepw",
    "organization": "myorg",
    "ssl": true,
    "batch_size": 200,
    "retry_interval" : "1s",
    "retry_exponential_base" : 2,
    "precision": "s",
    "max_retries": 20,
    "max_retry_time" : "168h",
    "process_messages" : {
      "see" : "docs of message processor for valid fields"
    },
    "meta_as_tags" : []
  }
}
  • type: makes the sink an influxdb sink
  • database: All metrics are written to this bucket
  • host: Hostname of the InfluxDB database server
  • port: Portnumber (as string) of the InfluxDB database server
  • user: Username for basic authentification
  • password: Password for basic authentification
  • organization: Organization in the InfluxDB
  • ssl: Use SSL connection
  • batch_size: batch up metrics internally, default 100
  • retry_interval: Base retry interval for failed write requests, default 1s
  • retry_exponential_base: The retry interval is exponentially increased with this base, default 2
  • max_retries: Maximal number of retry attempts
  • max_retry_time: Maximal time to retry failed writes, default 168h (one week)
  • precision: Precision of the timestamp. Valid values are ’s’, ‘ms’, ‘us’ and ’ns’. (default is ’s’)
  • process_messages: Process messages with given rules before progressing or dropping, see here (optional)
  • meta_as_tags: print all meta information as tags in the output (deprecated, optional)

For information about the calculation of the retry interval settings, see offical influxdb-client-go documentation

Using influxasync sink for communication with cc-metric-store

The cc-metric-store only accepts metrics with a timestamp precision in seconds, so it is required to use "precision": "s".

4.5.4 - influxdb sink

Toplevel influxSink

influxdb sink

The influxdb sink uses the official InfluxDB golang client to write the metrics to an InfluxDB database in a blocking fashion. It provides only support for V2 write endpoints (InfluxDB 1.8.0 or later).

Configuration structure

{
  "<name>": {
    "type": "influxdb",
    "database" : "mymetrics",
    "host": "dbhost.example.com",
    "port": "4222",
    "user": "exampleuser",
    "password" : "examplepw",
    "organization": "myorg",
    "ssl": true,
    "flush_delay" : "1s",
    "batch_size" : 1000,
    "use_gzip": true,
    "precision": "s",
    "process_messages" : {
      "see" : "docs of message processor for valid fields"
    },
    "meta_as_tags" : []
  }
}
  • type: makes the sink an influxdb sink
  • database: All metrics are written to this bucket
  • host: Hostname of the InfluxDB database server
  • port: Port number (as string) of the InfluxDB database server
  • user: Username for basic authentication
  • password: Password for basic authentication
  • organization: Organization in the InfluxDB
  • ssl: Use SSL connection
  • flush_delay: Group metrics coming in to a single batch
  • batch_size: Maximal batch size. If batch_size is reached before the end of flush_delay, the metrics are sent without further delay
  • precision: Precision of the timestamp. Valid values are ’s’, ‘ms’, ‘us’ and ’ns’. (default is ’s’)
  • process_messages: Process messages with given rules before progressing or dropping, see here (optional)
  • meta_as_tags: print all meta information as tags in the output (deprecated, optional)

Influx client options:

  • batch_size: Maximal batch size
  • meta_as_tags: move meta information keys to tags (optional)
  • http_request_timeout: HTTP request timeout
  • retry_interval: retry interval
  • max_retry_interval: maximum delay between each retry attempt
  • retry_exponential_base: base for the exponential retry delay
  • max_retries: maximum count of retry attempts of failed writes
  • max_retry_time: maximum total retry timeout
  • use_gzip: Specify whether to use GZip compression in write requests

Using influxdb sink for communication with cc-metric-store

The cc-metric-store only accepts metrics with a timestamp precision in seconds, so it is required to use "precision": "s".

4.5.5 - libganglia sink

Toplevel libgangliaSink

libganglia sink

The libganglia sink interacts directly with the library of the Ganglia Monitoring System to submit the metrics. Consequently, it needs to be installed on all nodes. But this is commonly the case if you want to use Ganglia, because it requires at least a node daemon (gmond or ganglia-monitor) to work.

The libganglia sink has probably less overhead compared to the ganglia sink because it does not require any process generation but initializes the environment and UDP connections only once.

Configuration structure

{
  "<name>": {
    "type": "libganglia",
    "gmetric_config" : "/path/to/gmetric/config",
    "cluster_name": "MyCluster",
    "add_ganglia_group" : true,
    "add_type_to_name": true,
    "add_units" : true,
    "process_messages" : {
      "see" : "docs of message processor for valid fields"
    },
    "meta_as_tags" : []
  }
}
  • type: makes the sink an libganglia sink
  • gmond_config: Path to the Ganglia configuration file gmond.conf (default: /etc/ganglia/gmond.conf)
  • cluster_name: Set a cluster name for the metric. If not set, it is taken from gmond_config
  • add_ganglia_group: Add a Ganglia metric group based on meta information. Some old versions of gmetric do not support the --group option
  • add_type_to_name: Ganglia commonly uses only node-level metrics but with cc-metric-collector, there are metrics for cpus, memory domains, CPU sockets and the whole node. In order to get eeng, this option prefixes the metric name with <type><type-id>_ or device_ depending on the metric tags and meta information. For metrics of the whole node type=node, no prefix is added
  • add_units: Add metric value unit if there is a unit entry in the metric tags or meta information
  • process_messages: Process messages with given rules before progressing or dropping, see here (optional)
  • meta_as_tags: print all meta information as tags in the output (deprecated, optional)

Ganglia Installation

My development system is Ubuntu 20.04. To install the required libraries with apt:

$ sudo apt install libganglia1

The libganglia.so gets installed in /usr/lib. The Ganglia headers libganglia1-dev are not required.

I added a Makefile in the sinks subfolder that searches for the library in /usr and creates a symlink (sinks/libganglia.so) for running/building the cc-metric-collector. So just type make before running/building in the main folder or the sinks subfolder.

4.5.6 - nats sink

Toplevel natsSink

nats sink

The nats sink publishes all metrics into a NATS network. The publishing key is the database name provided in the configuration file

Configuration structure

{
  "<name>": {
    "type": "nats",
    "database" : "mymetrics",
    "host": "dbhost.example.com",
    "port": "4222",
    "user": "exampleuser",
    "password" : "examplepw",
    "nkey_file": "/path/to/nkey_file",
    "flush_delay": "10s",
    "precision": "s",
    "process_messages" : {
      "see" : "docs of message processor for valid fields"
    },
    "meta_as_tags" : []
  }
}
  • type: makes the sink an nats sink
  • database: All metrics are published with this subject
  • host: Hostname of the NATS server
  • port: Port number (as string) of the NATS server
  • user: Username for basic authentication
  • password: Password for basic authentication
  • nkey_file: Path to credentials file with NKEY
  • flush_delay: Maximum time until metrics are sent out (default ‘5s’)
  • precision: Precision of the timestamp. Valid values are ’s’, ‘ms’, ‘us’ and ’ns’. (default is ’s’)
  • process_messages: Process messages with given rules before progressing or dropping, see here (optional)
  • meta_as_tags: print all meta information as tags in the output (deprecated, optional)

Using nats sink for communication with cc-metric-store

The cc-metric-store only accepts metrics with a timestamp precision in seconds, so it is required to use "precision": "s".

4.5.7 - prometheus sink

Toplevel prometheusSink

prometheus sink

The prometheus sink publishes all metrics via an HTTP server ready to be scraped by a Prometheus server. It creates gauge metrics for all node metrics and gauge vectors for all metrics with a subtype like ‘device’, ‘cpu’ or ‘socket’.

Configuration structure

{
  "<name>": {
    "type": "prometheus",
    "host": "localhost",
    "port": "8080",
    "path": "metrics",
    "process_messages" : {
      "see" : "docs of message processor for valid fields"
    },
    "meta_as_tags" : []
  }
}
  • type: makes the sink an prometheus sink
  • host: The HTTP server gets bound to that IP/hostname
  • port: Portnumber (as string) for the HTTP server
  • path: Path where the metrics should be servered. The metrics will be published at host:port/path
  • group_as_namespace: Most metrics contain a group as meta information like ‘memory’, ’load’. With this the metric names are extended to group_name if possible.
  • process_messages: Process messages with given rules before progressing or dropping, see here (optional)
  • meta_as_tags: print all meta information as tags in the output (deprecated, optional)

4.5.8 - stdout sink

Toplevel stdoutSink

stdout sink

The stdout sink is the most simple sink provided by cc-metric-collector. It writes all metrics in InfluxDB line-procol format to the configurable output file or the common special files stdout and stderr.

Configuration structure

{
  "<name>": {
    "type": "stdout",
    "meta_as_tags" : [],
    "output_file" : "mylogfile.log",
    "process_messages" : {
      "see" : "docs of message processor for valid fields"
    },
    "meta_as_tags" : []
  }
}
  • type: makes the sink an stdout sink
  • meta_as_tags: print meta information as tags in the output (optional)
  • output_file: Write all data to the selected file (optional). There are two ‘special’ files: stdout and stderr. If this option is not provided, the default value is stdout
  • process_messages: Process messages with given rules before progressing or dropping, see here (optional)
  • meta_as_tags: print all meta information as tags in the output (deprecated, optional)

5 - Commit message naming conventions

Special keywords to reference tickets and control release notes

Introduction

ClusterCockpit uses goreleaser for building and uploading releases. In this process the release notes including all notable changes are automatically generated based on special commit message tags. Moreover GitHub will parse special characters and words to link and close issues.

Reference issue tickets

It is good practice to always create a ticket for notable changes. This allows to comment and discuss about source code changes. Any commit that contributes to the ticket should reference the ticket id (in the commit message or description). This is achieved in GitHub by prefixing the ticket id with a number sign character (#):

This change contributes to #235

GitHub will detect if a pull request or commit uses special keywords to close a ticket:

  • close, closes, closed
  • fix, fixes, fixed
  • resolve, resolves, resolved

The ticket will not be closed before the commit appears on the main branch. Example:

This change fixes #423

Control release notes with preconfigured commit message prefixes

Commits with one of the following prefixes will appear in the release notes:

  • feat: Mark a commit to contain changes related to new features
  • fix: Mark a commit to contain changes related to bug fixes
  • sec: Mark a commit to contain changes related to security fixes
  • doc: Mark a commit to contain changes related to documentation updates
  • [feat|fix] dep: Mark a commit that is related to a dependency introduction or change

6 - Docsy example page

Example page to showcase formatting options for docsy.

This is a placeholder page. Replace it with your own content.

Text can be bold, italic, or strikethrough. Links should be blue with no underlines (unless hovered over).

There should be whitespace between paragraphs. Vape migas chillwave sriracha poutine try-hard distillery. Tattooed shabby chic small batch, pabst art party heirloom letterpress air plant pop-up. Sustainable chia skateboard art party banjo cardigan normcore affogato vexillologist quinoa meggings man bun master cleanse shoreditch readymade. Yuccie prism four dollar toast tbh cardigan iPhone, tumblr listicle live-edge VHS. Pug lyft normcore hot chicken biodiesel, actually keffiyeh thundercats photo booth pour-over twee fam food truck microdosing banh mi. Vice activated charcoal raclette unicorn live-edge post-ironic. Heirloom vexillologist coloring book, beard deep v letterpress echo park humblebrag tilde.

90’s four loko seitan photo booth gochujang freegan tumeric listicle fam ugh humblebrag. Bespoke leggings gastropub, biodiesel brunch pug fashion axe meh swag art party neutra deep v chia. Enamel pin fanny pack knausgaard tofu, artisan cronut hammock meditation occupy master cleanse chartreuse lumbersexual. Kombucha kogi viral truffaut synth distillery single-origin coffee ugh slow-carb marfa selfies. Pitchfork schlitz semiotics fanny pack, ugh artisan vegan vaporware hexagon. Polaroid fixie post-ironic venmo wolf ramps kale chips.

There should be no margin above this first sentence.

Blockquotes should be a lighter gray with a border along the left side in the secondary color.

There should be no margin below this final sentence.

First Header 2

This is a normal paragraph following a header. Knausgaard kale chips snackwave microdosing cronut copper mug swag synth bitters letterpress glossier craft beer. Mumblecore bushwick authentic gochujang vegan chambray meditation jean shorts irony. Viral farm-to-table kale chips, pork belly palo santo distillery activated charcoal aesthetic jianbing air plant woke lomo VHS organic. Tattooed locavore succulents heirloom, small batch sriracha echo park DIY af. Shaman you probably haven’t heard of them copper mug, crucifix green juice vape single-origin coffee brunch actually. Mustache etsy vexillologist raclette authentic fam. Tousled beard humblebrag asymmetrical. I love turkey, I love my job, I love my friends, I love Chardonnay!

Deae legum paulatimque terra, non vos mutata tacet: dic. Vocant docuique me plumas fila quin afuerunt copia haec o neque.

On big screens, paragraphs and headings should not take up the full container width, but we want tables, code blocks and similar to take the full width.

Scenester tumeric pickled, authentic crucifix post-ironic fam freegan VHS pork belly 8-bit yuccie PBR&B. I love this life we live in.

Second Header 2

This is a blockquote following a header. Bacon ipsum dolor sit amet t-bone doner shank drumstick, pork belly porchetta chuck sausage brisket ham hock rump pig. Chuck kielbasa leberkas, pork bresaola ham hock filet mignon cow shoulder short ribs biltong.

Header 3

This is a code block following a header.

Next level leggings before they sold out, PBR&B church-key shaman echo park. Kale chips occupy godard whatever pop-up freegan pork belly selfies. Gastropub Belinda subway tile woke post-ironic seitan. Shabby chic man bun semiotics vape, chia messenger bag plaid cardigan.

Header 4

  • This is an unordered list following a header.
  • This is an unordered list following a header.
  • This is an unordered list following a header.
Header 5
  1. This is an ordered list following a header.
  2. This is an ordered list following a header.
  3. This is an ordered list following a header.
Header 6
WhatFollows
A tableA header
A tableA header
A tableA header

There’s a horizontal rule above and below this.


Here is an unordered list:

  • Liverpool F.C.
  • Chelsea F.C.
  • Manchester United F.C.

And an ordered list:

  1. Michael Brecker
  2. Seamus Blake
  3. Branford Marsalis

And an unordered task list:

  • Create a Hugo theme
  • Add task lists to it
  • Take a vacation

And a “mixed” task list:

  • Pack bags
  • ?
  • Travel!

And a nested list:

  • Jackson 5
    • Michael
    • Tito
    • Jackie
    • Marlon
    • Jermaine
  • TMNT
    • Leonardo
    • Michelangelo
    • Donatello
    • Raphael

Definition lists can be used with Markdown syntax. Definition headers are bold.

Name
Godzilla
Born
1952
Birthplace
Japan
Color
Green

Tables should have bold headings and alternating shaded rows.

ArtistAlbumYear
Michael JacksonThriller1982
PrincePurple Rain1984
Beastie BoysLicense to Ill1986

If a table is too wide, it should scroll horizontally.

ArtistAlbumYearLabelAwardsSongs
Michael JacksonThriller1982Epic RecordsGrammy Award for Album of the Year, American Music Award for Favorite Pop/Rock Album, American Music Award for Favorite Soul/R&B Album, Brit Award for Best Selling Album, Grammy Award for Best Engineered Album, Non-ClassicalWanna Be Startin’ Somethin’, Baby Be Mine, The Girl Is Mine, Thriller, Beat It, Billie Jean, Human Nature, P.Y.T. (Pretty Young Thing), The Lady in My Life
PrincePurple Rain1984Warner Brothers RecordsGrammy Award for Best Score Soundtrack for Visual Media, American Music Award for Favorite Pop/Rock Album, American Music Award for Favorite Soul/R&B Album, Brit Award for Best Soundtrack/Cast Recording, Grammy Award for Best Rock Performance by a Duo or Group with VocalLet’s Go Crazy, Take Me With U, The Beautiful Ones, Computer Blue, Darling Nikki, When Doves Cry, I Would Die 4 U, Baby I’m a Star, Purple Rain
Beastie BoysLicense to Ill1986Mercury RecordsnoawardsbutthistablecelliswideRhymin & Stealin, The New Style, She’s Crafty, Posse in Effect, Slow Ride, Girls, (You Gotta) Fight for Your Right, No Sleep Till Brooklyn, Paul Revere, Hold It Now, Hit It, Brass Monkey, Slow and Low, Time to Get Ill

Code snippets like var foo = "bar"; can be shown inline.

Also, this should vertically align with this and this.

Code can also be shown in a block element.

foo := "bar";
bar := "foo";

Code can also use syntax highlighting.

func main() {
  input := `var foo = "bar";`

  lexer := lexers.Get("javascript")
  iterator, _ := lexer.Tokenise(nil, input)
  style := styles.Get("github")
  formatter := html.New(html.WithLineNumbers())

  var buff bytes.Buffer
  formatter.Format(&buff, style, iterator)

  fmt.Println(buff.String())
}
Long, single-line code blocks should not wrap. They should horizontally scroll if they are too long. This line should be long enough to demonstrate this.

Inline code inside table cells should still be distinguishable.

LanguageCode
Javascriptvar foo = "bar";
Rubyfoo = "bar"{

Small images should be shown at their actual size.

Large images should always scale down and fit in the content container.

The photo above of the Spruce Picea abies shoot with foliage buds: Bjørn Erik Pedersen, CC-BY-SA.

Components

Alerts

Another Heading

Add some sections here to see how the ToC looks like. Bacon ipsum dolor sit amet t-bone doner shank drumstick, pork belly porchetta chuck sausage brisket ham hock rump pig. Chuck kielbasa leberkas, pork bresaola ham hock filet mignon cow shoulder short ribs biltong.

This Document

Inguina genus: Anaphen post: lingua violente voce suae meus aetate diversi. Orbis unam nec flammaeque status deam Silenum erat et a ferrea. Excitus rigidum ait: vestro et Herculis convicia: nitidae deseruit coniuge Proteaque adiciam eripitur? Sitim noceat signa probat quidem. Sua longis fugatis quidem genae.

Pixel Count

Tilde photo booth wayfarers cliche lomo intelligentsia man braid kombucha vaporware farm-to-table mixtape portland. PBR&B pickled cornhole ugh try-hard ethical subway tile. Fixie paleo intelligentsia pabst. Ennui waistcoat vinyl gochujang. Poutine salvia authentic affogato, chambray lumbersexual shabby chic.

Contact Info

Plaid hell of cred microdosing, succulents tilde pour-over. Offal shabby chic 3 wolf moon blue bottle raw denim normcore poutine pork belly.

Stumptown PBR&B keytar plaid street art, forage XOXO pitchfork selvage affogato green juice listicle pickled everyday carry hashtag. Organic sustainable letterpress sartorial scenester intelligentsia swag bushwick. Put a bird on it stumptown neutra locavore. IPhone typewriter messenger bag narwhal. Ennui cold-pressed seitan flannel keytar, single-origin coffee adaptogen occupy yuccie williamsburg chillwave shoreditch forage waistcoat.

This is the final element on the page and there should be no margin below this.