Job Archive

Description of the locally saved JSON based job archived used with cc-backend

The job archive specifies an exchange format for job meta and performance metric data. It consists of two parts:

a SQLite database schema for job meta data and performance statistics
a Json file format together with a Directory hierarchy specification

By using an open, portable and simple specification based on files it is possible to exchange job performance data for research and analysis purposes as well as use it as a robust way for archiving job performance data to disk.

SQLite database schema

Introduction

A SQLite 3 database schema is provided to standardize the job meta data information in a portable way. The schema also includes optional columns for job performance statistics (called a job performance footprint). The database acts as a front end to filter and select subsets of job IDs, that are the keys to get the full job performance data in the job performance tree hierarchy.

Database schema

The schema includes 3 tables: the job table, a tag table and a jobtag table representing the MANY-TO-MANY relation between jobs and tags. The SQL schema is specified here. Explanation of the various columns including the JSON datatypes is documented here.

Directory hierarchy specification

Specification

To manage the number of directories within a single directory a tree approach is used splitting the integer job ID. The job id is split in junks of 1000 each. Usually 2 layers of directories is sufficient but the concept can be used for an arbitrary number of layers.

For a 2 layer schema this can be achieved with (code example in Perl):

$level1 = $jobID/1000;
$level2 = $jobID%1000;
$dstPath = sprintf("%s/%s/%d/%03d", $trunk, $destdir, $level1, $level2);

Example

For the job ID 1034871 the directory path is ./1034/871/.

Json file format

Overview

Every cluster must be configured in a cluster.json file.

The job data consists of two files:

meta.json: Contains job meta information and job statistics.
data.json: Contains complete job data with time series

The description of the json format specification is available as [[json schema|https://json-schema.org/]] format file. The latest version of the json schema is part of the cc-backend source tree. For external reference it is also available in a separate repository.

Specification `cluster.json`

The json schema specification in its raw format is available at the GitHub repository. A variant rendered for better readability is found in the references.

Specification `meta.json`

The json schema specification in its raw format is available at the GitHub repository. A variant rendered for better readability is found in the references.

Specification `data.json`

The json schema specification in its raw format is available at the GitHub repository. A variant rendered for better readability is found in the references.

Metric time series data is stored for a fixed time step. The time step is set per metric. If no value is available for a metric time series data timestamp null is entered.

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Job Archive

SQLite database schema

Introduction

Database schema

Directory hierarchy specification

Specification

Example

Json file format

Overview

Specification cluster.json

Specification meta.json

Specification data.json

Feedback

Specification `cluster.json`

Specification `meta.json`

Specification `data.json`