cc-energy-manager
ClusterCockpit Energy Manager References
Reference information regarding the ClusterCockpit component “cc-energy-manager” (GitHub Repo).
Overview
cc-energy-manager is a daemon that dynamically adjusts power limits on compute nodes of an HPC cluster to automatically optimize energy consumption. It is part of the ClusterCockpit ecosystem and integrates with cc-metric-collector for metrics and cc-node-controller for applying power limit changes.
Problem and Motivation
With large HPC systems, power draw is an ever-growing concern for grid infrastructure and environmental footprint. Modern CPUs and GPUs expose integrated power management that allows lowering maximum power limits (the opposite of overclocking). When a power limit is reduced, the chip’s internal power management lowers clock speeds to stay within budget — reducing both power draw and performance.
The key observation is that performance does not decrease proportionally to power reduction. Efficiency (energy per unit of work) therefore improves, even on chips that are not running near their thermal limits. cc-energy-manager exploits this by continuously searching for the power limit that minimizes the Energy Delay Product (EDP) — a metric that balances energy savings against execution time increase, avoiding excessive slowdowns.
How It Works
For each running job, cc-energy-manager performs a feedback-loop optimization:
- Measure: Collect power draw and a performance proxy metric (e.g., instructions per second for CPUs, CUDA kernel counts for GPUs) from
cc-metric-collector via NATS. - Evaluate: Compute the EDP for the current power limit.
- Search: Use a Golden Section Search algorithm to find the power limit that minimizes EDP within configured bounds.
- Apply: Send the new power limit to
cc-node-controller via NATS, which applies it using RAPL (CPUs) or NVML (NVIDIA GPUs).
The search alternates between two intervals: a shorter intervalSearch during active exploration and a longer intervalConverged once the optimizer has found a stable minimum.
Architecture
flowchart LR
CC[cc-metric-collector] -->|NATS metrics| R[Receivers]
R --> CM[Cluster Manager]
CM --> JM[Job Managers]
JM --> AGG[Aggregator]
AGG --> OPT[Optimizer\nGSSNG]
OPT --> CTL[Controller]
CTL -->|NATS set power limit| CNC[cc-node-controller]
CM --> SK[Sinks]Components
Receivers
Accept metric messages from external sources (e.g., via NATS) as provided by cc-metric-collector. The receiver manager forwards all incoming messages to the Cluster Manager.
Cluster Manager
The central orchestrator. It:
- Receives job start/stop events and creates or removes Job Managers accordingly.
- Routes incoming metrics to the correct Job Manager based on hostname and cluster membership.
- Filters jobs to only those matching the configured
partitionRegex. - Forwards processed metrics to the Sink Manager.
Job Manager
Manages the optimization lifecycle for a single job. It supports three optimization scopes:
| Scope | Description |
|---|
job | One optimizer instance shared across all nodes and devices of the job |
node | One optimizer per node; the resulting power limit is applied uniformly to all devices on that node |
device | One independent optimizer per device (socket or GPU) on each node |
The Job Manager runs optimization ticks on a timer, switching between intervalSearch and intervalConverged based on the optimizer’s convergence state.
Aggregator
Collects power and performance metric samples from incoming messages and computes the EDP value fed into the optimizer. Two aggregation strategies are available:
last: Uses the most recent metric value.median: Uses the median over a rolling time window.
The aggregator also supports multiple reduction modes when combining values across multiple devices (arithmetic mean, geometric mean, harmonic mean, min, max).
Optimizer
Implements the GSSNG (Golden Section Search with Narrowing/Broadening) algorithm. It maintains four sample points within a search window and moves toward the EDP minimum. The window narrows when a minimum is found; it broadens when the current load appears insufficient to distinguish power limit effects.
The gss type is a plain Golden Section Search without the narrowing/broadening heuristic.
Controller
Translates optimizer output into power limit commands sent to cc-node-controller via NATS. A separate NATS connection (using a requestSubject with a cluster-name placeholder %c) is created per cluster. The controller also caches node hardware topology information to map hardware threads to CPU sockets.
Sinks
Receive the processed metrics output from the Cluster Manager and forward them to external systems for monitoring or storage (e.g., stdout, InfluxDB).
- cc-metric-collector: Collects per-node hardware performance and power metrics; the primary metrics source for cc-energy-manager.
- cc-node-controller: Applies power limit changes on individual nodes via RAPL and NVML; receives commands from cc-energy-manager.
- cc-metric-store: Optional long-term metric storage; not directly used by cc-energy-manager but part of the broader ClusterCockpit stack.
1 - Commands
ClusterCockpit Energy Manager Command Line References
Build
This produces the cc-energy-manager binary in the repository root.
Run
./cc-energy-manager [options]
Options
| Flag | Default | Description |
|---|
-config <path> | ./config.json | Path to the JSON configuration file |
-loglevel <level> | warn | Logging verbosity: debug, info, warn, err, fatal, crit |
-logdate | false | Prefix every log line with date and time |
-once | false | Run all collectors once and then exit (useful for testing) |
Example
./cc-energy-manager -config /etc/cc-energy-manager/config.json -loglevel info -logdate
Signals
cc-energy-manager handles the following UNIX signals for graceful shutdown:
SIGTERM — sent by systemd on systemctl stopSIGINT — sent by Ctrl+C
On receiving either signal, the daemon stops all receivers, sinks, the cluster manager, and the controller before exiting.
2 - Configuration
ClusterCockpit Energy Manager Configuration Option References
Configuration is provided as a JSON file. The default path is ./config.json in the working directory; an alternative path can be specified with the -config flag.
The configuration has four required top-level sections: receivers, sinks, controller, and clusters.
Receivers Section
A named map of receiver configurations. Each receiver defines a source from which cc-energy-manager ingests metric messages.
"receivers": {
"<name>": {
"type": "nats",
"address": "nats-server.example.org",
"port": "4222",
"subject": "metrics.subject"
}
}
Fields (NATS receiver):
type (string, required): Receiver type. Currently "nats".address (string, required): Hostname or IP of the NATS server.port (string, required): Port of the NATS server.subject (string, required): NATS subject to subscribe to.
Sinks Section
A named map of sink configurations. Each sink defines a destination for processed metric messages.
"sinks": {
"<name>": {
"type": "stdout",
"meta_as_tags": []
}
}
Fields (stdout sink):
type (string, required): Sink type. Examples: "stdout", "influxasync".meta_as_tags (array of strings, optional): Metadata fields to promote to tags.
Controller Section
Configuration for the connection to cc-node-controller, which applies power limit changes on compute nodes.
"controller": {
"nats": {
"url": "nats://nats-server.example.org:4222",
"requestSubject": "cc-node-controller.%c.request"
},
"toposMaxAge": 86400
}
Fields:
nats (object, required): NATS connection settings for cc-node-controller.url (string, required): NATS server URL, e.g. "nats://localhost:4222".requestSubject (string, required): NATS subject for sending control commands. Use %c as a placeholder for the cluster name — it is substituted at runtime for each cluster (e.g. "cc-node-controller.%c.request" becomes "cc-node-controller.fritz.request" for cluster fritz).
toposMaxAge (integer, optional): How long to cache node hardware topology data in seconds. Default: 86400 (1 day).
Clusters Section
An array of cluster configurations. Each cluster defines which nodes to manage and how to optimize their power limits.
"clusters": [
{
"name": "fritz",
"powerBudgetTotal": 10000,
"partitionRegex": "^energy_efficient$",
"subclusters": [ ... ]
}
]
Cluster fields:
name (string, required): Cluster identifier. Must match the cluster name used in metric tags.powerBudgetTotal (number, required): Total power budget for this cluster in watts. Used for proportional budget distribution across device types.partitionRegex (string, required): Regular expression matched against the job’s partition/queue name. Only jobs on matching partitions are managed.subclusters (array, required): List of subcluster configurations (see below).
Subcluster Configuration
A subcluster groups nodes within a cluster that share the same hardware configuration.
{
"name": "main",
"hostRegex": "^f\\d\\d\\d\\d$",
"devicetypes": { ... }
}
Fields:
name (string, required): Subcluster identifier.hostRegex (string, required): Regular expression matched against node hostnames to assign nodes to this subcluster.devicetypes (object, required): Map of device type name to device type configuration (see below). Supported keys: "socket", "nvidia_gpu", "amd_gpu".
Device Type Configuration
Each entry in devicetypes configures how cc-energy-manager optimizes a specific hardware device type.
"socket": {
"scope": "node",
"aggregator": {
"type": "last",
"powerMetric": "cpu_energy",
"performanceMetric": "ips",
"deviceType": "socket"
},
"controlName": "rapl.pkg_power_limit1",
"controlDefaultValue": 300,
"intervalConverged": "10m",
"intervalSearch": "2m",
"powerBudgetWeight": 1,
"optimizer": {
"type": "gssng",
"tolerance": 5,
"borders": {
"lower": 123,
"upper": 800
}
}
}
Fields:
scope (string, required): Optimization granularity. One of:"job" — single optimizer shared across all devices of the job"node" — one optimizer per node, applied to all devices on that node"device" — independent optimizer per device on each node
aggregator (object, required): Metric aggregation configuration.type (string, required): Aggregation strategy: "last" (most recent value) or "median" (median over time window).powerMetric (string, required): Name of the power metric to track (e.g. "cpu_energy", "acc_power").performanceMetric (string, required): Name of the performance proxy metric (e.g. "ips" for instructions per second, "kernels" for CUDA kernel count).deviceType (string, required): Device type from which to read metrics.
controlName (string, required): RAPL or NVML control name used to set the power limit. Examples:"rapl.pkg_power_limit1" — RAPL package power limit for CPU sockets"nvml.power_limit" — NVML power limit for NVIDIA GPUs
controlDefaultValue (number, required): Power limit in watts applied when no optimization is active (e.g. when a job ends).intervalConverged (string, required): How often to run optimization after convergence. Parsed as a Go duration string (e.g. "10m", "5m").intervalSearch (string, required): How often to run optimization during active search. Should be shorter than intervalConverged (e.g. "2m").powerBudgetWeight (number, required): Relative weight for budget allocation when multiple device types share powerBudgetTotal. A device type with weight 2 receives twice the budget fraction of a device type with weight 1.optimizer (object, required): Optimizer algorithm configuration.type (string, required): Algorithm type. "gssng" (Golden Section Search with Narrowing/Broadening, recommended) or "gss" (plain Golden Section Search).tolerance (number, required): Convergence tolerance in watts. The optimizer considers itself converged when the search interval is smaller than this value.borders (object, required): Power limit bounds.lower (number, required): Minimum allowed power limit in watts.upper (number, required): Maximum allowed power limit in watts.
Complete Example
The following example configures two clusters: fritz (CPU-only nodes with socket-level optimization) and alex (GPU nodes with both GPU and CPU optimization).
{
"receivers": {
"testnats": {
"type": "nats",
"address": "nats-server.example.org",
"port": "4222",
"subject": "subject"
}
},
"sinks": {
"testoutput": {
"type": "stdout",
"meta_as_tags": []
}
},
"controller": {
"nats": {
"url": "nats://nats-server.example.org:4222",
"requestSubject": "cc-node-controller.%c.request"
},
"toposMaxAge": 86400
},
"clusters": [
{
"name": "fritz",
"powerBudgetTotal": 10000,
"partitionRegex": "^energy_efficient$",
"subclusters": [
{
"name": "main",
"hostRegex": "^f\\d\\d\\d\\d$",
"devicetypes": {
"socket": {
"scope": "node",
"aggregator": {
"type": "last",
"powerMetric": "cpu_energy",
"performanceMetric": "ips",
"deviceType": "socket"
},
"controlName": "rapl.pkg_power_limit1",
"controlDefaultValue": 300,
"intervalConverged": "10m",
"intervalSearch": "2m",
"powerBudgetWeight": 1,
"optimizer": {
"type": "gssng",
"tolerance": 5,
"borders": {
"lower": 123,
"upper": 800
}
}
}
}
}
]
},
{
"name": "alex",
"powerBudgetTotal": 20000,
"partitionRegex": "^only_this_partition_please$",
"subclusters": [
{
"name": "a100",
"hostRegex": "^a\\d\\d\\d\\d$",
"devicetypes": {
"nvidia_gpu": {
"scope": "device",
"aggregator": {
"type": "last",
"powerMetric": "acc_power",
"performanceMetric": "kernels",
"deviceType": "nvidia_gpu"
},
"controlName": "nvml.power_limit",
"controlDefaultValue": 250,
"intervalConverged": "5m",
"intervalSearch": "2m",
"powerBudgetWeight": 2,
"optimizer": {
"type": "gssng",
"tolerance": 5,
"borders": {
"lower": 123,
"upper": 800
}
}
},
"socket": {
"scope": "job",
"aggregator": {
"type": "median",
"powerMetric": "cpu_energy",
"performanceMetric": "ips",
"deviceType": "nvidia_gpu"
},
"controlName": "rapl.pkg_power_limit1",
"controlDefaultValue": 100,
"intervalConverged": "5m",
"intervalSearch": "2m",
"powerBudgetWeight": 1,
"optimizer": {
"type": "gssng",
"tolerance": 5,
"borders": {
"lower": 123,
"upper": 800
}
}
}
}
}
]
}
]
}