To decide on a sensible and meaningful set of metrics is deciding factor for how useful the monitoring will be. As part of a collaborative project several academic HPC centers came up with a minimal set of metrics including their naming. To use a consistent naming is crucial for establishing what metrics mean and we urge you to adhere to the metric names suggested there. You can find this list as part of the ClusterCockpit job data structure JSON schemas.
ClusterCockpit supports multiple clusters within one instance of cc-backend
.
You have to create separate metric lists for each of them. In cc-backend
the
metric lists are provided as part of the cluster configuration. Every cluster is
configured as part of the
job archive using one
cluster.json
file per cluster.
This how-to describes
in-detail how to create a cluster.json
file.
flops_any
mem_bw
mem_used
cpu_load
net_bw
file_bw
ipc
cpu_user
flops_dp
flops_sp
clock
rapl_power
acc_used
acc_mem_used
acc_power
eth_read_bw
eth_write_bw
ic_read_bw
ic_write_bw
cc-backend
In the schema a tree of file system metrics is suggested. This allows to provide a similar set of metrics for different file systems used in a cluster. The file system type names suggested are:
read_bw
write_bw
read_req
write_req
inodes
accesses
fsync
create
open
close
seek
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.