Troubleshooting

Debugging and common issues

Check Service Status

Verify the daemon is running:

sudo systemctl status cc-slurm-adapter

You should see output indicating the service is active (running).

View Logs

cc-slurm-adapter logs to stderr (captured by systemd):

sudo journalctl -u cc-slurm-adapter -f

Use -f to follow logs in real-time, or omit it to view historical logs.

Enable Debug Logging

Edit the systemd service file to add -debug 5:

ExecStart=/opt/cc-slurm-adapter/cc-slurm-adapter -daemon -debug 5 -config /opt/cc-slurm-adapter/config.json

Then reload and restart:

sudo systemctl daemon-reload
sudo systemctl restart cc-slurm-adapter

Log Levels:

  • 2 (default): Errors and warnings
  • 5 (max): Verbose debug output

Common Issues

IssuePossible CauseSolution
No jobs reportedMissing Slurm permissionsRun sacctmgr add user cc-slurm-adapter Account=root AdminLevel=operator
Socket connection errorsWrong socket path or permissionsCheck prepSockListenPath/prepSockConnectPath and RuntimeDirectoryMode
Prolog/Epilog failuresNon-zero exit code in hook scriptEnsure hook script exits with exit 0
Missing resource infoDaemon stopped too longKeep daemon running; resource info expires minutes after job completion
Job allocation failuresProlog/Epilog exit code ≠ 0Check hook script and ensure cc-slurm-adapter is running

Debugging Slurm Compatibility Issues

If you encounter nil pointer dereferences or unexpected errors:

  1. Get a job ID via squeue or sacct:

    squeue
    # or
    sacct
    
  2. Check JSON layouts from both commands (they differ):

    sacct -j 12345 --json
    scontrol show job 12345 --json
    
  3. Compare the output with what the adapter expects in slurm.go

  4. Report issues to the GitHub repository with:

    • Slurm version
    • JSON output samples
    • Error messages from logs

Verifying Configuration

Check that your configuration is valid:

# Test if config file is readable
cat /opt/cc-slurm-adapter/config.json

# Verify JSON syntax
jq . /opt/cc-slurm-adapter/config.json

Testing Connectivity

Test cc-backend Connection

# Test REST API endpoint (replace with your JWT)
curl -H "Authorization: Bearer YOUR_JWT_TOKEN" \
     https://your-cc-backend-instance.example/api/jobs/

Test NATS Connection

If using NATS, verify connectivity:

# Using nats-cli (if installed)
nats server check -s nats://mynatsserver.example:4222

Performance Issues

If the adapter is slow or missing jobs:

  1. Check Slurm Response Times: Run sacct and squeue manually to see if Slurm is responding slowly
  2. Adjust Poll Intervals: Lower slurmPollInterval for more frequent checks (but higher load)
  3. Enable Prolog/Epilog: Reduces dependency on polling for immediate job notification
  4. Check System Resources: Ensure adequate CPU/memory on the slurmctld node