This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Collectors

Available metric collectors for cc-metric-collector

    Overview

    Collectors read data from various sources on the local system, parse it into metrics, and submit these metrics to the router. Each collector is a modular plugin that can be enabled or disabled independently.

    Configuration Format

    File: collectors.json

    The collectors configuration is a set of objects (not a list), where each key is the collector type:

    {
      "collector_type": {
        "collector_specific_option": "value"
      }
    }
    

    Common Configuration Options

    Most collectors support these common options:

    OptionTypeDefaultDescription
    exclude_metrics[]string[]List of metric names to exclude from forwarding to sinks
    send_metaboolvariesSend metadata information along with metrics

    Example:

    {
      "cpustat": {
        "exclude_metrics": ["cpu_idle", "cpu_guest"]
      },
      "memstat": {}
    }
    

    Available Collectors

    System Metrics

    CollectorDescriptionSource
    cpustatCPU usage statistics/proc/stat
    memstatMemory usage statistics/proc/meminfo
    loadavgSystem load average/proc/loadavg
    netstatNetwork interface statistics/proc/net/dev
    diskstatDisk I/O statistics/sys/block/*/stat
    iostatBlock device I/O statistics/proc/diskstats

    Hardware Monitoring

    CollectorDescriptionRequirements
    tempstatTemperature sensors/sys/class/hwmon
    cpufreqCPU frequency/sys/devices/system
    cpufreq_cpuinfoCPU frequency from cpuinfo/proc/cpuinfo
    ipmistatIPMI sensor dataipmitool command

    Performance Monitoring

    CollectorDescriptionRequirements
    likwidHardware performance counters via LIKWIDliblikwid.so
    raplCPU energy consumption (RAPL)/sys/class/powercap
    schedstatCPU scheduler statistics/proc/schedstat
    numastatsNUMA node statistics/sys/devices/system/node

    GPU Monitoring

    CollectorDescriptionRequirements
    nvidiaNVIDIA GPU metricslibnvidia-ml.so (NVML)
    rocm_smiAMD ROCm GPU metricslibrocm_smi64.so

    Network & Storage

    CollectorDescriptionRequirements
    ibstatInfiniBand statistics/sys/class/infiniband
    lustrestatLustre filesystem statisticsLustre client
    gpfsGPFS filesystem statisticsGPFS utilities
    beegfs_metaBeeGFS metadata statisticsBeeGFS metadata client
    beegfs_storageBeeGFS storage statisticsBeeGFS storage client
    nfs3statNFS v3 statistics/proc/net/rpc/nfs
    nfs4statNFS v4 statistics/proc/net/rpc/nfs
    nfsiostatNFS I/O statisticsnfsiostat command

    Process & Job Monitoring

    CollectorDescriptionRequirements
    topprocsTop processes by resource usage/proc filesystem
    slurm_cgroupSlurm cgroup statisticsSlurm cgroups
    selfCollector’s own resource usage/proc/self

    Custom Collectors

    CollectorDescriptionRequirements
    customcmdExecute custom commands to collect metricsAny command/script

    Collector Lifecycle

    Each collector implements these functions:

    • Init(config): Initializes the collector with configuration
    • Initialized(): Returns whether initialization was successful
    • Read(duration, output): Reads metrics and sends to output channel
    • Close(): Cleanup and shutdown

    Example Configurations

    Minimal System Monitoring

    {
      "cpustat": {},
      "memstat": {},
      "loadavg": {}
    }
    

    HPC Node Monitoring

    {
      "cpustat": {},
      "memstat": {},
      "diskstat": {},
      "netstat": {},
      "loadavg": {},
      "tempstat": {},
      "likwid": {
        "access_mode": "direct",
        "liblikwid_path": "/usr/local/lib/liblikwid.so",
        "eventsets": [
          {
            "events": {
              "cpu": ["FLOPS_DP", "CLOCK"]
            }
          }
        ]
      },
      "nvidia": {},
      "ibstat": {}
    }
    

    Filesystem-Heavy Workload

    {
      "cpustat": {},
      "memstat": {},
      "diskstat": {},
      "lustrestat": {},
      "nfs4stat": {},
      "iostat": {}
    }
    

    Minimal Overhead

    {
      "cpustat": {
        "exclude_metrics": ["cpu_guest", "cpu_guest_nice", "cpu_steal"]
      },
      "memstat": {
        "exclude_metrics": ["mem_slab", "mem_sreclaimable"]
      }
    }
    

    Collector Development

    Creating a Custom Collector

    Collectors implement the MetricCollector interface. See collectors README for details.

    Basic structure:

    type SampleCollector struct {
        metricCollector
        config SampleCollectorConfig
    }
    
    func (m *SampleCollector) Init(config json.RawMessage) error
    func (m *SampleCollector) Read(interval time.Duration, output chan lp.CCMetric)
    func (m *SampleCollector) Close()
    

    Registration

    Add your collector to collectorManager.go:

    var AvailableCollectors = map[string]MetricCollector{
        "sample": &SampleCollector{},
    }
    

    Metric Format

    All collectors submit metrics in InfluxDB line protocol format via the CCMetric type.

    Metric components:

    • Name: Metric identifier (e.g., cpu_used)
    • Tags: Index-like key-value pairs (e.g., type=node, hostname=node01)
    • Fields: Data values (typically just value)
    • Metadata: Source, group, unit information
    • Timestamp: When the metric was collected

    Performance Considerations

    • Collector overhead: Each enabled collector adds CPU overhead
    • I/O impact: Some collectors read many files (e.g., per-core statistics)
    • Library overhead: GPU and hardware performance collectors can be expensive
    • Selective metrics: Use exclude_metrics to reduce unnecessary data

    See Also