Collectors

Available metric collectors for cc-metric-collector

Overview

Collectors read data from various sources on the local system, parse it into metrics, and submit these metrics to the router. Each collector is a modular plugin that can be enabled or disabled independently.

Configuration Format

File: collectors.json

The collectors configuration is a set of objects (not a list), where each key is the collector type:

{
  "collector_type": {
    "collector_specific_option": "value"
  }
}

Common Configuration Options

Most collectors support these common options:

OptionTypeDefaultDescription
exclude_metrics[]string[]List of metric names to exclude from forwarding to sinks
send_metaboolvariesSend metadata information along with metrics

Example:

{
  "cpustat": {
    "exclude_metrics": ["cpu_idle", "cpu_guest"]
  },
  "memstat": {}
}

Available Collectors

System Metrics

CollectorDescriptionSource
cpustatCPU usage statistics/proc/stat
memstatMemory usage statistics/proc/meminfo
loadavgSystem load average/proc/loadavg
netstatNetwork interface statistics/proc/net/dev
diskstatDisk I/O statistics/sys/block/*/stat
iostatBlock device I/O statistics/proc/diskstats

Hardware Monitoring

CollectorDescriptionRequirements
tempstatTemperature sensors/sys/class/hwmon
cpufreqCPU frequency/sys/devices/system
cpufreq_cpuinfoCPU frequency from cpuinfo/proc/cpuinfo
ipmistatIPMI sensor dataipmitool command

Performance Monitoring

CollectorDescriptionRequirements
likwidHardware performance counters via LIKWIDliblikwid.so
raplCPU energy consumption (RAPL)/sys/class/powercap
schedstatCPU scheduler statistics/proc/schedstat
numastatsNUMA node statistics/sys/devices/system/node

GPU Monitoring

CollectorDescriptionRequirements
nvidiaNVIDIA GPU metricslibnvidia-ml.so (NVML)
rocm_smiAMD ROCm GPU metricslibrocm_smi64.so

Network & Storage

CollectorDescriptionRequirements
ibstatInfiniBand statistics/sys/class/infiniband
lustrestatLustre filesystem statisticsLustre client
gpfsGPFS filesystem statisticsGPFS utilities
beegfs_metaBeeGFS metadata statisticsBeeGFS metadata client
beegfs_storageBeeGFS storage statisticsBeeGFS storage client
nfs3statNFS v3 statistics/proc/net/rpc/nfs
nfs4statNFS v4 statistics/proc/net/rpc/nfs
nfsiostatNFS I/O statisticsnfsiostat command

Process & Job Monitoring

CollectorDescriptionRequirements
topprocsTop processes by resource usage/proc filesystem
slurm_cgroupSlurm cgroup statisticsSlurm cgroups
selfCollector’s own resource usage/proc/self

Custom Collectors

CollectorDescriptionRequirements
customcmdExecute custom commands to collect metricsAny command/script

Collector Lifecycle

Each collector implements these functions:

  • Init(config): Initializes the collector with configuration
  • Initialized(): Returns whether initialization was successful
  • Read(duration, output): Reads metrics and sends to output channel
  • Close(): Cleanup and shutdown

Example Configurations

Minimal System Monitoring

{
  "cpustat": {},
  "memstat": {},
  "loadavg": {}
}

HPC Node Monitoring

{
  "cpustat": {},
  "memstat": {},
  "diskstat": {},
  "netstat": {},
  "loadavg": {},
  "tempstat": {},
  "likwid": {
    "access_mode": "direct",
    "liblikwid_path": "/usr/local/lib/liblikwid.so",
    "eventsets": [
      {
        "events": {
          "cpu": ["FLOPS_DP", "CLOCK"]
        }
      }
    ]
  },
  "nvidia": {},
  "ibstat": {}
}

Filesystem-Heavy Workload

{
  "cpustat": {},
  "memstat": {},
  "diskstat": {},
  "lustrestat": {},
  "nfs4stat": {},
  "iostat": {}
}

Minimal Overhead

{
  "cpustat": {
    "exclude_metrics": ["cpu_guest", "cpu_guest_nice", "cpu_steal"]
  },
  "memstat": {
    "exclude_metrics": ["mem_slab", "mem_sreclaimable"]
  }
}

Collector Development

Creating a Custom Collector

Collectors implement the MetricCollector interface. See collectors README for details.

Basic structure:

type SampleCollector struct {
    metricCollector
    config SampleCollectorConfig
}

func (m *SampleCollector) Init(config json.RawMessage) error
func (m *SampleCollector) Read(interval time.Duration, output chan lp.CCMetric)
func (m *SampleCollector) Close()

Registration

Add your collector to collectorManager.go:

var AvailableCollectors = map[string]MetricCollector{
    "sample": &SampleCollector{},
}

Metric Format

All collectors submit metrics in InfluxDB line protocol format via the CCMetric type.

Metric components:

  • Name: Metric identifier (e.g., cpu_used)
  • Tags: Index-like key-value pairs (e.g., type=node, hostname=node01)
  • Fields: Data values (typically just value)
  • Metadata: Source, group, unit information
  • Timestamp: When the metric was collected

Performance Considerations

  • Collector overhead: Each enabled collector adds CPU overhead
  • I/O impact: Some collectors read many files (e.g., per-core statistics)
  • Library overhead: GPU and hardware performance collectors can be expensive
  • Selective metrics: Use exclude_metrics to reduce unnecessary data

See Also