This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Metric Store

ClusterCockpit Metric Store References

Reference information regarding the ClusterCockpit component “cc-metric-store” (GitHub Repo).

1 - Command Line

ClusterCockpit Metric Store Command Line Options

This page describes the command line options for the cc-metric-store executable.


  -config <path>

Function: Specifies alternative path to application configuration file.

Default: ./config.json

Example: -config ./configfiles/configuration.json


  -dev

Function: Enables the Swagger UI REST API documentation and playground


  -gops

Function: Go server listens via github.com/google/gops/agent (for debugging).


  -version

Function: Shows version information and exits.

Example config:

{
  "metrics": {
    "debug_metric": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "clock": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "cpu_idle": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "cpu_iowait": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "cpu_irq": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "cpu_system": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "cpu_user": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "nv_mem_util": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "nv_temp": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "nv_sm_clock": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "acc_utilization": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "acc_mem_used": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "acc_power": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "flops_any": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "flops_dp": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "flops_sp": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "ib_recv": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "ib_xmit": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "ib_recv_pkts": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "ib_xmit_pkts": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "cpu_power": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "core_power": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "mem_power": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "ipc": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "cpu_load": {
      "frequency": 60,
      "aggregation": null
    },
    "lustre_close": {
      "frequency": 60,
      "aggregation": null
    },
    "lustre_open": {
      "frequency": 60,
      "aggregation": null
    },
    "lustre_statfs": {
      "frequency": 60,
      "aggregation": null
    },
    "lustre_read_bytes": {
      "frequency": 60,
      "aggregation": null
    },
    "lustre_write_bytes": {
      "frequency": 60,
      "aggregation": null
    },
    "net_bw": {
      "frequency": 60,
      "aggregation": null
    },
    "file_bw": {
      "frequency": 60,
      "aggregation": null
    },
    "mem_bw": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "mem_cached": {
      "frequency": 60,
      "aggregation": null
    },
    "mem_used": {
      "frequency": 60,
      "aggregation": null
    },
    "vectorization_ratio": {
      "frequency": 60,
      "aggregation": "avg"
    }
  },
  "checkpoints": {
    "interval": "1h",
    "directory": "./var/checkpoints",
    "restore": "1h"
  },
  "archive": {
    "interval": "24h",
    "directory": "./var/archive"
  },
  "http-api": {
    "address": "localhost:8082",
    "https-cert-file": null,
    "https-key-file": null
  },
  "retention-in-memory": "48h",
  "nats": null,
  "jwt-public-key": "kzfYrYy+TzpanWZHJ5qSdMj5uKUWgq74BWhQG6copP0="
}

2 - Configuration

ClusterCockpit Metric Store Configuration Option References

All durations are specified as string that will be parsed like this (Allowed suffixes: s, m, h, …).

  • metrics: Map of metric-name to objects with the following properties
    • frequency: Timestep/Interval/Resolution of this metric
    • aggregation: Can be "sum", "avg" or null
      • null means aggregation across nodes is forbidden for this metric
      • "sum" means that values from the child levels are summed up for the parent level
      • "avg" means that values from the child levels are averaged for the parent level
    • scope: Unused at the moment, should be something like "node", "socket" or "hwthread"
  • nats:
    • address: Url of NATS.io server, example: “nats://localhost:4222”
    • username and password: Optional, if provided use those for the connection
    • subscriptions:
      • subscribe-to: Where to expect the measurements to be published
      • cluster-tag: Default value for the cluster tag
  • http-api:
    • address: Address to bind to, for example 0.0.0.0:8080
    • https-cert-file and https-key-file: Optional, if provided enable HTTPS using those files as certificate/key
  • jwt-public-key: Base64 encoded string, use this to verify requests to the HTTP API
  • retention-on-memory: Keep all values in memory for at least that amount of time
  • checkpoints:
    • interval: Do checkpoints every X seconds/minutes/hours
    • directory: Path to a directory
    • restore: After a restart, load the last X seconds/minutes/hours of data back into memory
  • archive:
    • interval: Move and compress all checkpoints not needed anymore every X seconds/minutes/hours
    • directory: Path to a directory

3 - REST API

ClusterCockpit Metric Store RESTful API Endpoint description

Authentication

JWT tokens

cc-metric-store supports only JWT tokens using the EdDSA/Ed25519 signing method. The token is provided using the Authorization Bearer header.

Example script to test the endpoint:

#Only use JWT token if the JWT authentication has been setup
JWT="eyJ0eXAiOiJKV1QiLCJhbGciOiJFZERTQSJ9.eyJ1c2VyIjoiYWRtaW4iLCJyb2xlcyI6WyJST0xFX0FETUlOIiwiUk9MRV9BTkFMWVNUIiwiUk9MRV9VU0VSIl19.d-3_3FZTsadPjDEdsWrrQ7nS0edMAR4zjl-eK7rJU3HziNBfI9PDHDIpJVHTNN5E5SlLGLFXctWyKAkwhXL-Dw"

curl -X 'GET' 'http://localhost:8081/api/query/' -H "Authorization: Bearer $JWT" -d "{ \"cluster\": \"alex\", \"from\": 1720879275, \"to\": 1720964715, \"queries\": [{\"metric\": \"cpu_load\",\"host\": \"a0124\"}] }"

NATS

TODO

Usage of Swagger UI

This Swagger UI is also available as part of cc-metric-store if you start it with the dev option:

./cc-metric-store -dev

You may access it at this URL.

Payload format for write endpoint

The data comes in Influx DB line protocol format.

<metric>,cluster=<cluster>,hostname=<hostname>,type=<node/hwthread/etc> value=<value> <epoch_time_in_ns_or_s>

Real example:

proc_run,cluster=fritz,hostname=f2163,type=node value=4i 1725620476214474893

A more detailed description of the ClusterCockpit flavored Influx DB line protocol and their types can be found here in CC specification.

Example script to test endpoint:

#Only use JWT token if the JWT authentication has been setup
JWT="eyJ0eXAiOiJKV1QiLCJhbGciOiJFZERTQSJ9.eyJ1c2VyIjoiYWRtaW4iLCJyb2xlcyI6WyJST0xFX0FETUlOIiwiUk9MRV9BTkFMWVNUIiwiUk9MRV9VU0VSIl19.d-3_3FZTsadPjDEdsWrrQ7nS0edMAR4zjl-eK7rJU3HziNBfI9PDHDIpJVHTNN5E5SlLGLFXctWyKAkwhXL-Dw"

curl -X 'GET' 'http://localhost:8081/api/write/?cluster=alex' -H "Authorization: Bearer $JWT" -d "proc_run,cluster=fritz,hostname=f2163,type=node value=4i 1725620476214474893"

Usage of Swagger UI

This Swagger UI is also available as part of cc-metric-store if you start it with the dev option:

./cc-metric-store -dev

You may access it at this URL.

Swagger API Reference