cc-metric-store

ClusterCockpit Metric Store References

1: Command Line
2: Configuration
3: Metric Store REST API

Reference information regarding the ClusterCockpit component “cc-metric-store” (GitHub Repo).

1 - Command Line

ClusterCockpit Metric Store Command Line Options

This page describes the command line options for the cc-metric-store executable.

  -config <path>

Function: Specifies alternative path to application configuration file.

Default: ./config.json

Example: -config ./configfiles/configuration.json

  -dev

Function: Enables the Swagger UI REST API documentation and playground

  -gops

Function: Go server listens via github.com/google/gops/agent (for debugging).

  -version

Function: Shows version information and exits.

Example config:

{
  "metrics": {
    "debug_metric": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "clock": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "cpu_idle": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "cpu_iowait": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "cpu_irq": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "cpu_system": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "cpu_user": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "nv_mem_util": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "nv_temp": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "nv_sm_clock": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "acc_utilization": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "acc_mem_used": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "acc_power": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "flops_any": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "flops_dp": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "flops_sp": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "ib_recv": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "ib_xmit": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "ib_recv_pkts": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "ib_xmit_pkts": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "cpu_power": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "core_power": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "mem_power": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "ipc": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "cpu_load": {
      "frequency": 60,
      "aggregation": null
    },
    "lustre_close": {
      "frequency": 60,
      "aggregation": null
    },
    "lustre_open": {
      "frequency": 60,
      "aggregation": null
    },
    "lustre_statfs": {
      "frequency": 60,
      "aggregation": null
    },
    "lustre_read_bytes": {
      "frequency": 60,
      "aggregation": null
    },
    "lustre_write_bytes": {
      "frequency": 60,
      "aggregation": null
    },
    "net_bw": {
      "frequency": 60,
      "aggregation": null
    },
    "file_bw": {
      "frequency": 60,
      "aggregation": null
    },
    "mem_bw": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "mem_cached": {
      "frequency": 60,
      "aggregation": null
    },
    "mem_used": {
      "frequency": 60,
      "aggregation": null
    },
    "vectorization_ratio": {
      "frequency": 60,
      "aggregation": "avg"
    }
  },
  "checkpoints": {
    "interval": "1h",
    "directory": "./var/checkpoints",
    "restore": "1h"
  },
  "archive": {
    "interval": "24h",
    "directory": "./var/archive"
  },
  "http-api": {
    "address": "localhost:8082",
    "https-cert-file": null,
    "https-key-file": null
  },
  "retention-in-memory": "48h",
  "nats": null,
  "jwt-public-key": "kzfYrYy+TzpanWZHJ5qSdMj5uKUWgq74BWhQG6copP0="
}

2 - Configuration

ClusterCockpit Metric Store Configuration Option References

Configuration options are located in a JSON file. Default path is config.json in current working directory. Alternative paths to the configuration file can be specified using the command line switch -config <filename>.

All durations are specified as string that will be parsed like this (Allowed suffixes: s, m, h, …).

Recognized attributes:

metrics: Map of metric-name to objects with the following properties (required)
- frequency: Timestep/Interval/Resolution of this metric (required)
- aggregation: Can be "sum", "avg" or null (required)
  - null means aggregation across nodes is forbidden for this metric
  - "sum" means that values from the child levels are summed up for the parent level
  - "avg" means that values from the child levels are averaged for the parent level
nats: (optional)
- address: Url of NATS.io server, example: “nats://localhost:4222”
- creds-file-path: Path to a NATS credentials file
- subscriptions (array of objects):
  - subscribe-to: Where to expect the measurements to be published
  - cluster-tag: Default value for the cluster tag
http-api: (required)
- address: Address to bind to, for example 0.0.0.0:8080 (required)
- https-cert-file and https-key-file: if provided enable HTTPS using those files as certificate/key (optional)
jwt-public-key: Base64 encoded string, use this to verify requests to the HTTP API (required)
retention-on-memory: Keep all values in memory for at least that amount of time (required)
checkpoints: (required)
- interval: Do checkpoints every X seconds/minutes/hours (required)
- directory: Path to a directory (required)
- restore: After a restart, load the last X seconds/minutes/hours of data back into memory (required)
archive: (required)
- interval: Move and compress all checkpoints not needed anymore every X seconds/minutes/hours (required)
- directory: Path to a directory (required)

3 - Metric Store REST API

ClusterCockpit Metric Store RESTful API Endpoint description

Authentication

JWT tokens

cc-metric-store supports only JWT tokens using the EdDSA/Ed25519 signing method. The token is provided using the Authorization Bearer header.

Example script to test the endpoint:

#Only use JWT token if the JWT authentication has been setup
JWT="eyJ0eXAiOiJKV1QiLCJhbGciOiJFZERTQSJ9.eyJ1c2VyIjoiYWRtaW4iLCJyb2xlcyI6WyJST0xFX0FETUlOIiwiUk9MRV9BTkFMWVNUIiwiUk9MRV9VU0VSIl19.d-3_3FZTsadPjDEdsWrrQ7nS0edMAR4zjl-eK7rJU3HziNBfI9PDHDIpJVHTNN5E5SlLGLFXctWyKAkwhXL-Dw"

curl -X 'GET' 'http://localhost:8081/api/query/' -H "Authorization: Bearer $JWT" -d "{ \"cluster\": \"alex\", \"from\": 1720879275, \"to\": 1720964715, \"queries\": [{\"metric\": \"cpu_load\",\"host\": \"a0124\"}] }"

NATS

TODO

Usage of Swagger UI

This Swagger UI is also available as part of cc-metric-store if you start it with the dev option:

./cc-metric-store -dev

You may access it at this URL.

Payload format for write endpoint

The data comes in Influx DB line protocol format.

<metric>,cluster=<cluster>,hostname=<hostname>,type=<node/hwthread/etc> value=<value> <epoch_time_in_ns_or_s>

Real example:

proc_run,cluster=fritz,hostname=f2163,type=node value=4i 1725620476214474893

A more detailed description of the ClusterCockpit flavored Influx DB line protocol and their types can be found here in CC specification.

Example script to test endpoint:

#Only use JWT token if the JWT authentication has been setup
JWT="eyJ0eXAiOiJKV1QiLCJhbGciOiJFZERTQSJ9.eyJ1c2VyIjoiYWRtaW4iLCJyb2xlcyI6WyJST0xFX0FETUlOIiwiUk9MRV9BTkFMWVNUIiwiUk9MRV9VU0VSIl19.d-3_3FZTsadPjDEdsWrrQ7nS0edMAR4zjl-eK7rJU3HziNBfI9PDHDIpJVHTNN5E5SlLGLFXctWyKAkwhXL-Dw"

curl -X 'GET' 'http://localhost:8081/api/write/?cluster=alex' -H "Authorization: Bearer $JWT" -d "proc_run,cluster=fritz,hostname=f2163,type=node value=4i 1725620476214474893"

Usage of Swagger UI

This Swagger UI is also available as part of cc-metric-store if you start it with the dev option:

./cc-metric-store -dev

You may access it at this URL.

Swagger API Reference

Non-Interactive Documentation

This reference is rendered using the swagger-ui plugin based on the original definition file found in the ClusterCockpit repository, but without a serving backend.

This means that all interactivity (“Try It Out”) will not return actual data. However, a Curl call and a compiled Request URL will still be displayed, if an API endpoint is executed.