Release specific infos

Settings and issues specific to the current release

New performance and energy footprint configuration

In previous versions cc-backend used a set of hard-coded metrics for the performance footprint. The database had dedicated columns for each of these metric stats in order to filter jobs using those performance metrics. Because you may want to use different footprints on an accelerated cluster compared to a standard multi-core system, this is a severe restriction. Version 1.4.0 of cc-backend introduces a new string attribute footprint for metrics in the cluster.json configuration of the job archive. This allows you do define your individual performance footprint for every cluster and optionally subcluster. This also enables you to change the footprint configuration if required. The footprint metrics will be used in the footprint UI component shown in job views and optionally job lists. They are also used for the metrics shown in the polar plot and are available for sorting and filtering jobs.

Metrics configured as footprints are collected as aggregated key:value pairs in one JSON object for every job, either on job completion, or during runtime in configurable intervals. The JSON object itself is written to the database in a single dedicated column named footprint.

Please note: In order to guarantee a seamless update, follow the instructions on updating the cluster.json and migrating the database on this page. With missing configuration of the footprint attribute, only existing jobs will show footprint data after update and database migration, while subsequently completed jobs will not be updated due to missing information, and therefore show no footprint data.

Moreover, cc-backend also provides an energy footprint configuration now. This is a set of metrics that are used to calculate the total energy used by a job. The metrics used for the energy footprint are also marked using a new attribute energy in the cluster metric configurations.

What you need to do

You need to adapt all of your cluster.json files in the job archive marking all footprint or energy metrics.

Here is an example how to mark a footprint metric:

{
  "name": "fritz",
  "metricConfig": [
    {
      "name": "mem_used",
      "unit": {
        "base": "B",
        "prefix": "G"
      },
      "scope": "node",
      "aggregation": "sum",
      "footprint": "max",
      "timestep": 60,
      "peak": 256,
      "normal": 128,
      "caution": 200,
      "alert": 240,
      "lowerIsBetter": true,
      "subClusters": [
        {
          "name": "spr1tb",
          "peak": 1024,
          "normal": 512,
          "caution": 900,
          "footprint": "max",
          "lowerIsBetter": true,
          "alert": 1000
        },
        {
          "name": "spr2tb",
          "peak": 2048,
          "normal": 1024,
          "caution": 1800,
          "footprint": "max",
          "lowerIsBetter": true,
          "alert": 2000
        }
      ]
    }
  ]
}

In case the metrics has subcluster overwrites you currently have to also add the attributes there. The new attribute footprint can have avg, min, or max as value indicating what basic statistic over all nodes or cores of a job is used for this metric. In above example the footprint is the maximum allocated memory. Because this is (for us) a lower is better metric, this is marked accordingly using the attribute lowerIsBetter.

To mark a metric to be used for calculating the total energy you need to add the energy attribute.

Example for marking an energy footprint metric:

{
  "name": "fritz",
  "metricConfig": [
    {
      "name": "cpu_power",
      "unit": {
        "base": "W"
      },
      "scope": "socket",
      "aggregation": "sum",
      "timestep": 60,
      "peak": 500,
      "normal": 250,
      "caution": 100,
      "alert": 50,
      "energy": "power"
    },
    {
      "name": "mem_power",
      "unit": {
        "base": "W"
      },
      "scope": "socket",
      "aggregation": "sum",
      "timestep": 60,
      "peak": 100,
      "normal": 50,
      "caution": 20,
      "alert": 10,
      "energy": "power"
    }
  ]
}

Again you need to add the attribute also to subcluster overwrite in case you have some. The energy attribute can have power or energy as values. Power indicates that this metric has Watt as unit and energy is used for metrics that have Joules as unit. We are aware that we could also already get this information from the existing metric configuration, but that’s the way it is currently implemented. Power metrics are converted to Joules using the average job power and multiplying by the job duration. The total job power is then the sum over all energy footprint metrics.

The web frontend can also show the CO2 footprint for a job. To enable this you need to add a new top level configuration key emission-constant in g/kWh to the cc-backend configuration:

{
  "emission-constant": 317,
{

After you have marked all metrics you need to raise the job archive version manually to 2 by editing ./var/job-archive/version.txt

Database migration

This release requires to migrate your database to version 8. Backup your database before migration! Depending on your database size this may take a long time. In our case with a database file size of 50GB it took more than eight hours.

To migrate the database run the following command:

cc-backend -migrate-db

The migration creates the new footprint column and updates its JSON object for existing jobs using the old footprint columns. Moreover it sets the global scope for all existing tags and creates additional indices to speed up common queries.

Configuration changes

You can find a complete configuration example here.

Enable timeseries resampling

ClusterCockpit now supports resampling of time series data to a lower frequency. This dramatically improves load times for very large or very long jobs and we recommend to enable it. Resampling is supported for running as well as for finished jobs. For running jobs this currently only works with the newest version of cc-metric-store. Resampling support for the Prometheus time series database will be added in the future.

To enable resampling you have to add the following toplevel configuration key:

  "enable-resampling": {
    "trigger": 30,
    "resolutions": [
      600,
      300,
      120,
      60
    ]
  },

Trigger configures at which minimum number of points in every timeseries plot window the next finer level is loaded. Resolutions defines the resolution steps in seconds. The finest resolution must be the native resolution. In case you have different native solutions in your metric configuration you should use the finest. The implementation will fallback to the finest available resolution in this case.

Continuous scroll is default now

This release includes support for continuous scroll for job lists, replacing the previous paging ui. Continuous scroll is now the default and you can remove the ui-defaults block in case you added it just for enabling continuous scroll. Every user can overwrite the scrolling option in his configuration.

Known issues

Currently energy footprint metrics of type energy are ignored for calculating total energy.
Resampling for running jobs only works with cc-metric-store
With energy footprint metrics of type power the unit is ignored and it is assumed the metric has the unit Watt.