This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

cc-energy-manager

ClusterCockpit Energy Manager References

Reference information regarding the ClusterCockpit component “cc-energy-manager” (GitHub Repo).

Overview

cc-energy-manager is a daemon that dynamically adjusts power limits on compute nodes of an HPC cluster to automatically optimize energy consumption. It is part of the ClusterCockpit ecosystem and integrates with cc-metric-collector for metrics and cc-node-controller for applying power limit changes.

Problem and Motivation

With large HPC systems, power draw is an ever-growing concern for grid infrastructure and environmental footprint. Modern CPUs and GPUs expose integrated power management that allows lowering maximum power limits (the opposite of overclocking). When a power limit is reduced, the chip’s internal power management lowers clock speeds to stay within budget — reducing both power draw and performance.

The key observation is that performance does not decrease proportionally to power reduction. Efficiency (energy per unit of work) therefore improves, even on chips that are not running near their thermal limits. cc-energy-manager exploits this by continuously searching for the power limit that minimizes the Energy Delay Product (EDP) — a metric that balances energy savings against execution time increase, avoiding excessive slowdowns.

How It Works

For each running job, cc-energy-manager performs a feedback-loop optimization:

  1. Measure: Collect power draw and a performance proxy metric (e.g., instructions per second for CPUs, CUDA kernel counts for GPUs) from cc-metric-collector via NATS.
  2. Evaluate: Compute the EDP for the current power limit.
  3. Search: Use a Golden Section Search algorithm to find the power limit that minimizes EDP within configured bounds.
  4. Apply: Send the new power limit to cc-node-controller via NATS, which applies it using RAPL (CPUs) or NVML (NVIDIA GPUs).

The search alternates between two intervals: a shorter intervalSearch during active exploration and a longer intervalConverged once the optimizer has found a stable minimum.

Architecture

flowchart LR
    CC[cc-metric-collector] -->|NATS metrics| R[Receivers]
    R --> CM[Cluster Manager]
    CM --> JM[Job Managers]
    JM --> AGG[Aggregator]
    AGG --> OPT[Optimizer\nGSSNG]
    OPT --> CTL[Controller]
    CTL -->|NATS set power limit| CNC[cc-node-controller]
    CM --> SK[Sinks]

Components

Receivers

Accept metric messages from external sources (e.g., via NATS) as provided by cc-metric-collector. The receiver manager forwards all incoming messages to the Cluster Manager.

Cluster Manager

The central orchestrator. It:

  • Receives job start/stop events and creates or removes Job Managers accordingly.
  • Routes incoming metrics to the correct Job Manager based on hostname and cluster membership.
  • Filters jobs to only those matching the configured partitionRegex.
  • Forwards processed metrics to the Sink Manager.

Job Manager

Manages the optimization lifecycle for a single job. It supports three optimization scopes:

ScopeDescription
jobOne optimizer instance shared across all nodes and devices of the job
nodeOne optimizer per node; the resulting power limit is applied uniformly to all devices on that node
deviceOne independent optimizer per device (socket or GPU) on each node

The Job Manager runs optimization ticks on a timer, switching between intervalSearch and intervalConverged based on the optimizer’s convergence state.

Aggregator

Collects power and performance metric samples from incoming messages and computes the EDP value fed into the optimizer. Two aggregation strategies are available:

  • last: Uses the most recent metric value.
  • median: Uses the median over a rolling time window.

The aggregator also supports multiple reduction modes when combining values across multiple devices (arithmetic mean, geometric mean, harmonic mean, min, max).

Optimizer

Implements the GSSNG (Golden Section Search with Narrowing/Broadening) algorithm. It maintains four sample points within a search window and moves toward the EDP minimum. The window narrows when a minimum is found; it broadens when the current load appears insufficient to distinguish power limit effects.

The gss type is a plain Golden Section Search without the narrowing/broadening heuristic.

Controller

Translates optimizer output into power limit commands sent to cc-node-controller via NATS. A separate NATS connection (using a requestSubject with a cluster-name placeholder %c) is created per cluster. The controller also caches node hardware topology information to map hardware threads to CPU sockets.

Sinks

Receive the processed metrics output from the Cluster Manager and forward them to external systems for monitoring or storage (e.g., stdout, InfluxDB).

  • cc-metric-collector: Collects per-node hardware performance and power metrics; the primary metrics source for cc-energy-manager.
  • cc-node-controller: Applies power limit changes on individual nodes via RAPL and NVML; receives commands from cc-energy-manager.
  • cc-metric-store: Optional long-term metric storage; not directly used by cc-energy-manager but part of the broader ClusterCockpit stack.

1 - Commands

ClusterCockpit Energy Manager Command Line References

Build

make

This produces the cc-energy-manager binary in the repository root.

Run

./cc-energy-manager [options]

Options

FlagDefaultDescription
-config <path>./config.jsonPath to the JSON configuration file
-loglevel <level>warnLogging verbosity: debug, info, warn, err, fatal, crit
-logdatefalsePrefix every log line with date and time
-oncefalseRun all collectors once and then exit (useful for testing)

Example

./cc-energy-manager -config /etc/cc-energy-manager/config.json -loglevel info -logdate

Signals

cc-energy-manager handles the following UNIX signals for graceful shutdown:

  • SIGTERM — sent by systemd on systemctl stop
  • SIGINT — sent by Ctrl+C

On receiving either signal, the daemon stops all receivers, sinks, the cluster manager, and the controller before exiting.

2 - Configuration

ClusterCockpit Energy Manager Configuration Option References

Configuration is provided as a JSON file. The default path is ./config.json in the working directory; an alternative path can be specified with the -config flag.

The configuration has four required top-level sections: receivers, sinks, controller, and clusters.

Receivers Section

A named map of receiver configurations. Each receiver defines a source from which cc-energy-manager ingests metric messages.

"receivers": {
    "<name>": {
        "type": "nats",
        "address": "nats-server.example.org",
        "port": "4222",
        "subject": "metrics.subject"
    }
}

Fields (NATS receiver):

  • type (string, required): Receiver type. Currently "nats".
  • address (string, required): Hostname or IP of the NATS server.
  • port (string, required): Port of the NATS server.
  • subject (string, required): NATS subject to subscribe to.

Sinks Section

A named map of sink configurations. Each sink defines a destination for processed metric messages.

"sinks": {
    "<name>": {
        "type": "stdout",
        "meta_as_tags": []
    }
}

Fields (stdout sink):

  • type (string, required): Sink type. Examples: "stdout", "influxasync".
  • meta_as_tags (array of strings, optional): Metadata fields to promote to tags.

Controller Section

Configuration for the connection to cc-node-controller, which applies power limit changes on compute nodes.

"controller": {
    "nats": {
        "url": "nats://nats-server.example.org:4222",
        "requestSubject": "cc-node-controller.%c.request"
    },
    "toposMaxAge": 86400
}

Fields:

  • nats (object, required): NATS connection settings for cc-node-controller.
    • url (string, required): NATS server URL, e.g. "nats://localhost:4222".
    • requestSubject (string, required): NATS subject for sending control commands. Use %c as a placeholder for the cluster name — it is substituted at runtime for each cluster (e.g. "cc-node-controller.%c.request" becomes "cc-node-controller.fritz.request" for cluster fritz).
  • toposMaxAge (integer, optional): How long to cache node hardware topology data in seconds. Default: 86400 (1 day).

Clusters Section

An array of cluster configurations. Each cluster defines which nodes to manage and how to optimize their power limits.

"clusters": [
    {
        "name": "fritz",
        "powerBudgetTotal": 10000,
        "partitionRegex": "^energy_efficient$",
        "subclusters": [ ... ]
    }
]

Cluster fields:

  • name (string, required): Cluster identifier. Must match the cluster name used in metric tags.
  • powerBudgetTotal (number, required): Total power budget for this cluster in watts. Used for proportional budget distribution across device types.
  • partitionRegex (string, required): Regular expression matched against the job’s partition/queue name. Only jobs on matching partitions are managed.
  • subclusters (array, required): List of subcluster configurations (see below).

Subcluster Configuration

A subcluster groups nodes within a cluster that share the same hardware configuration.

{
    "name": "main",
    "hostRegex": "^f\\d\\d\\d\\d$",
    "devicetypes": { ... }
}

Fields:

  • name (string, required): Subcluster identifier.
  • hostRegex (string, required): Regular expression matched against node hostnames to assign nodes to this subcluster.
  • devicetypes (object, required): Map of device type name to device type configuration (see below). Supported keys: "socket", "nvidia_gpu", "amd_gpu".

Device Type Configuration

Each entry in devicetypes configures how cc-energy-manager optimizes a specific hardware device type.

"socket": {
    "scope": "node",
    "aggregator": {
        "type": "last",
        "powerMetric": "cpu_energy",
        "performanceMetric": "ips",
        "deviceType": "socket"
    },
    "controlName": "rapl.pkg_power_limit1",
    "controlDefaultValue": 300,
    "intervalConverged": "10m",
    "intervalSearch": "2m",
    "powerBudgetWeight": 1,
    "optimizer": {
        "type": "gssng",
        "tolerance": 5,
        "borders": {
            "lower": 123,
            "upper": 800
        }
    }
}

Fields:

  • scope (string, required): Optimization granularity. One of:
    • "job" — single optimizer shared across all devices of the job
    • "node" — one optimizer per node, applied to all devices on that node
    • "device" — independent optimizer per device on each node
  • aggregator (object, required): Metric aggregation configuration.
    • type (string, required): Aggregation strategy: "last" (most recent value) or "median" (median over time window).
    • powerMetric (string, required): Name of the power metric to track (e.g. "cpu_energy", "acc_power").
    • performanceMetric (string, required): Name of the performance proxy metric (e.g. "ips" for instructions per second, "kernels" for CUDA kernel count).
    • deviceType (string, required): Device type from which to read metrics.
  • controlName (string, required): RAPL or NVML control name used to set the power limit. Examples:
    • "rapl.pkg_power_limit1" — RAPL package power limit for CPU sockets
    • "nvml.power_limit" — NVML power limit for NVIDIA GPUs
  • controlDefaultValue (number, required): Power limit in watts applied when no optimization is active (e.g. when a job ends).
  • intervalConverged (string, required): How often to run optimization after convergence. Parsed as a Go duration string (e.g. "10m", "5m").
  • intervalSearch (string, required): How often to run optimization during active search. Should be shorter than intervalConverged (e.g. "2m").
  • powerBudgetWeight (number, required): Relative weight for budget allocation when multiple device types share powerBudgetTotal. A device type with weight 2 receives twice the budget fraction of a device type with weight 1.
  • optimizer (object, required): Optimizer algorithm configuration.
    • type (string, required): Algorithm type. "gssng" (Golden Section Search with Narrowing/Broadening, recommended) or "gss" (plain Golden Section Search).
    • tolerance (number, required): Convergence tolerance in watts. The optimizer considers itself converged when the search interval is smaller than this value.
    • borders (object, required): Power limit bounds.
      • lower (number, required): Minimum allowed power limit in watts.
      • upper (number, required): Maximum allowed power limit in watts.

Complete Example

The following example configures two clusters: fritz (CPU-only nodes with socket-level optimization) and alex (GPU nodes with both GPU and CPU optimization).

{
    "receivers": {
        "testnats": {
            "type": "nats",
            "address": "nats-server.example.org",
            "port": "4222",
            "subject": "subject"
        }
    },
    "sinks": {
        "testoutput": {
            "type": "stdout",
            "meta_as_tags": []
        }
    },
    "controller": {
        "nats": {
            "url": "nats://nats-server.example.org:4222",
            "requestSubject": "cc-node-controller.%c.request"
        },
        "toposMaxAge": 86400
    },
    "clusters": [
        {
            "name": "fritz",
            "powerBudgetTotal": 10000,
            "partitionRegex": "^energy_efficient$",
            "subclusters": [
                {
                    "name": "main",
                    "hostRegex": "^f\\d\\d\\d\\d$",
                    "devicetypes": {
                        "socket": {
                            "scope": "node",
                            "aggregator": {
                                "type": "last",
                                "powerMetric": "cpu_energy",
                                "performanceMetric": "ips",
                                "deviceType": "socket"
                            },
                            "controlName": "rapl.pkg_power_limit1",
                            "controlDefaultValue": 300,
                            "intervalConverged": "10m",
                            "intervalSearch": "2m",
                            "powerBudgetWeight": 1,
                            "optimizer": {
                                "type": "gssng",
                                "tolerance": 5,
                                "borders": {
                                    "lower": 123,
                                    "upper": 800
                                }
                            }
                        }
                    }
                }
            ]
        },
        {
            "name": "alex",
            "powerBudgetTotal": 20000,
            "partitionRegex": "^only_this_partition_please$",
            "subclusters": [
                {
                    "name": "a100",
                    "hostRegex": "^a\\d\\d\\d\\d$",
                    "devicetypes": {
                        "nvidia_gpu": {
                            "scope": "device",
                            "aggregator": {
                                "type": "last",
                                "powerMetric": "acc_power",
                                "performanceMetric": "kernels",
                                "deviceType": "nvidia_gpu"
                            },
                            "controlName": "nvml.power_limit",
                            "controlDefaultValue": 250,
                            "intervalConverged": "5m",
                            "intervalSearch": "2m",
                            "powerBudgetWeight": 2,
                            "optimizer": {
                                "type": "gssng",
                                "tolerance": 5,
                                "borders": {
                                    "lower": 123,
                                    "upper": 800
                                }
                            }
                        },
                        "socket": {
                            "scope": "job",
                            "aggregator": {
                                "type": "median",
                                "powerMetric": "cpu_energy",
                                "performanceMetric": "ips",
                                "deviceType": "nvidia_gpu"
                            },
                            "controlName": "rapl.pkg_power_limit1",
                            "controlDefaultValue": 100,
                            "intervalConverged": "5m",
                            "intervalSearch": "2m",
                            "powerBudgetWeight": 1,
                            "optimizer": {
                                "type": "gssng",
                                "tolerance": 5,
                                "borders": {
                                    "lower": 123,
                                    "upper": 800
                                }
                            }
                        }
                    }
                }
            ]
        }
    ]
}