This is the multi-page printable view of this section. Click here to print.
Explanation
1 - Authentication
Overview
The authentication is implemented in internal/auth/
. In auth.go
an interface is defined that any authentication provider must fulfill. It also
acts as a dispatcher to delegate the calls to the available authentication
providers.
Two authentication types are available:
- JWT authentication for the REST API that does not create a session cookie
- Session based authentication using a session cookie
The most important routines in auth are:
Login()
Handle POST request to login user and start a new sessionAuth()
Authenticate user and put User Object in context of the request
The http router calls auth in the following cases:
r.Handle("/login", authentication.Login( ... )).Methods(http.MethodPost)
: The POST request on the/login
route will call the Login callback.r.Handle("/jwt-login", authentication.Login( ... ))
: Any request on the/jwt-login
route will call the Login callback. Intended for use for the JWT token based authenticators.- Any route in the secured subrouter will always call Auth(), on success it will call the next handler in the chain, on failure it will render the login template.
secured.Use(func(next http.Handler) http.Handler {
return authentication.Auth(
// On success;
next,
// On failure:
func(rw http.ResponseWriter, r *http.Request, err error) {
// Render login form
})
})
A JWT token can be used to initiate an authenticated user session. This can either happen by calling the login route with a token provided in a header or via a special cookie containing the JWT token. For API routes the access is authenticated on every request using the JWT token and no session is initiated.
Login
The Login function (located in auth.go
):
- Extracts the user name and gets the user from the user database table. In case the user is not found the user object is set to nil.
- Iterates over all authenticators and:
- Calls its
CanLogin
function which checks if the authentication method is supported for this user. - Calls its
Login
function to authenticate the user. On success a valid user object is returned. - Creates a new session object, stores the user attributes in the session and saves the session.
- Starts the
onSuccess
http handler
- Calls its
Local authenticator
This authenticator is applied if
return user != nil && user.AuthSource == AuthViaLocalPassword
Compares the password provided by the login form to the password hash stored in the user database table:
if e := bcrypt.CompareHashAndPassword([]byte(user.Password), []byte(r.FormValue("password"))); e != nil {
log.Errorf("AUTH/LOCAL > Authentication for user %s failed!", user.Username)
return nil, fmt.Errorf("Authentication failed")
}
LDAP authenticator
This authenticator is applied if the user was found in the database and its AuthSource is LDAP:
if user != nil {
if user.AuthSource == schema.AuthViaLDAP {
return user, true
}
}
If the option SyncUserOnLogin
is set it tried to sync the user from the LDAP
directory. In case this succeeds the user is persisted to the database and can
login.
Gets the LDAP connection and tries a bind with the provided credentials:
if err := l.Bind(userDn, r.FormValue("password")); err != nil {
log.Errorf("AUTH/LDAP > Authentication for user %s failed: %v", user.Username, err)
return nil, fmt.Errorf("Authentication failed")
}
JWT Session authenticator
Login via JWT token will create a session without password.
For login the X-Auth-Token
header is not supported. This authenticator is
applied if the Authorization header or query parameter login-token is present:
return user, r.Header.Get("Authorization") != "" ||
r.URL.Query().Get("login-token") != ""
The Login function:
- Parses the token and checks if it is expired
- Check if the signing method is EdDSA or HS256 or HS512
- Check if claims are valid and extracts the claims
- The following claims have to be present:
sub
: The subject, in this case this is the usernameexp
: Expiration in Unix epoch timeroles
: String array with roles of user
- In case user does not exist in the database and the option
SyncUserOnLogin
is set add user to user database table withAuthViaToken
AuthSource. - Return valid user object
JWT Cookie Session authenticator
Login via JWT cookie token will create a session without password. It is first checked if the required configuration options are set:
trustedIssuer
CookieName
and optionally the environment variable CROSS_LOGIN_JWT_PUBLIC_KEY
is set.
This authenticator is applied if the configured cookie is present:
jwtCookie, err := r.Cookie(cookieName)
if err == nil && jwtCookie.Value != "" {
return true
}
The Login function:
- Extracts and parses the token
- Checks if signing method is Ed25519/EdDSA
- In case publicKeyCrossLogin is configured:
- Check if
iss
issuer claim matched trusted issuer from configuration - Return public cross login key
- Otherwise return standard public key
- Check if
- Check if claims are valid
- Depending on the option
validateUser
the roles are extracted from JWT token or taken from user object fetched from database - Ask browser to delete the JWT cookie
- In case user does not exist in the database and the option
SyncUserOnLogin
is set add user to user database table withAuthViaToken
AuthSource. - Return valid user object
Auth
The Auth function (located in auth.go
):
- Returns a new http handler function that is defined right away
- This handler tries two methods to authenticate a user:
- Via a JWT API token in
AuthViaJWT()
- Via a valid session in
AuthViaSession()
- Via a JWT API token in
- If err is not nil and the user object is valid it puts the user object in the request context and starts the onSuccess http handler
- Otherwise it calls the onFailure handler
AuthViaJWT
Implemented in JWTAuthenticator:
- Extract token either from header
X-Auth-Token
orAuthorization
with Bearer prefix - Parse token and check if it is valid. The Parse routine will also check if the token is expired.
- If the option
validateUser
is set it will ensure the user object exists in the database and takes the roles from the database user - Otherwise the roles are extracted from the roles claim
- Returns a valid user object with AuthType set to AuthToken
AuthViaSession
- Extracts session
- Get values username, projects, and roles from session
- Returns a valid user object with AuthType set to AuthSession
2 - Configuration Management
Release versions
Versions are marked according to semantic versioning. Each version embeds the following static assets in the binary:
- Web frontend with javascript files and all static assets
- Golang template files for server-side rendering
- JSON schema files for validation
- Database migration files
The remaining external assets are:
- The SQL database used
- The job archive
- The configuration files
config.json
and.env
The external assets are versioned with integer IDs.
This means that each release binary is bound to specific versions of the SQL
database and the job archive.
The configuration file is checked against the current schema at startup.
The -migrate-db
command line switch can be used to migrate the SQL database
from a previous version to the latest one.
We offer a separate tool archive-migration
to migrate an existing job archive
from the previous to the latest version.
Versioning of APIs
cc-backend provides two API backends:
- A REST API for querying jobs.
- A GraphQL API for data exchange between web frontend and cc-backend.
The REST API will also be versioned. We still have to decide whether we will also support older REST API versions by versioning the endpoint URLs. The GraphQL API is for internal use and will not be versioned.
How to build
In general it is recommended to use the provided release binary.
In case you want to build build cc-backend
please always use the provided makefile. This will ensure
that the frontend is also built correctly and that the version in the binary is encoded in the binary.
3 - Job Archive
The job archive specifies an exchange format for job meta and performance metric data. It consists of two parts:
- a SQLite database schema for job meta data and performance statistics
- a Json file format together with a Directory hierarchy specification
By using an open, portable and simple specification based on files it is possible to exchange job performance data for research and analysis purposes as well as use it as a robust way for archiving job performance data to disk.
SQLite database schema
Introduction
A SQLite 3 database schema is provided to standardize the job meta data information in a portable way. The schema also includes optional columns for job performance statistics (called a job performance footprint). The database acts as a front end to filter and select subsets of job IDs, that are the keys to get the full job performance data in the job performance tree hierarchy.
Database schema
The schema includes 3 tables: the job table, a tag table and a jobtag table representing the MANY-TO-MANY relation between jobs and tags. The SQL schema is specified here. Explanation of the various columns including the JSON datatypes is documented here.
Directory hierarchy specification
Specification
To manage the number of directories within a single directory a tree approach is used splitting the integer job ID. The job id is split in junks of 1000 each. Usually 2 layers of directories is sufficient but the concept can be used for an arbitrary number of layers.
For a 2 layer schema this can be achieved with (code example in Perl):
$level1 = $jobID/1000;
$level2 = $jobID%1000;
$dstPath = sprintf("%s/%s/%d/%03d", $trunk, $destdir, $level1, $level2);
Example
For the job ID 1034871 the directory path is ./1034/871/
.
Json file format
Overview
Every cluster must be configured in a cluster.json
file.
The job data consists of two files:
meta.json
: Contains job meta information and job statistics.data.json
: Contains complete job data with time series
The description of the json format specification is available as [[json
schema|https://json-schema.org/]] format file. The latest version of the json
schema is part of the cc-backend
source tree. For external reference it is
also available in a separate repository.
Specification cluster.json
The json schema specification in its raw format is available at the GitHub repository. A variant rendered for better readability is found in the references.
Specification meta.json
The json schema specification in its raw format is available at the GitHub repository. A variant rendered for better readability is found in the references.
Specification data.json
The json schema specification in its raw format is available at the GitHub repository. A variant rendered for better readability is found in the references.
Metric time series data is stored for a fixed time step. The time step is set
per metric. If no value is available for a metric time series data timestamp
null
is entered.
4 - JSON Web Token
Introduction
ClusterCockpit uses JSON Web Tokens (JWT) for authorization of its APIs.
JSON Web Token (JWT) is an open standard (RFC 7519) that defines a compact and self-contained way for securely transmitting information between parties as a JSON object.
This information can be verified and trusted because it is digitally signed.
In ClusterCockpit JWTs are signed using a public/private key pair using ECDSA.
Because tokens are signed using public/private key pairs, the signature also certifies that only the party holding the private key is the one that signed it.
Expiration of the generated tokens as well as the maximum length of a browser session can be configured in the config.json
file described here.
The Ed25519 algorithm for signatures was used because it is compatible with other tools that require authentication, such as NATS.io, and because these elliptic-curve methods provide simillar security with smaller keys compared to something like RSA. They are sligthly more expensive to validate, but that effect is negligible.
JWT Payload
You may view the payload of a JWT token at https://jwt.io/#debugger-io. Currently ClusterCockpit sets the following claims:
iat
: Issued at claim. The “iat” claim is used to identify the the time at which the JWT was issued. This claim can be used to determine the age of the JWT.sub
: Subject claim. Identifies the subject of the JWT, in our case this is the username.roles
: An array of strings specifying the roles set for the subject.exp
: Expiration date of the token (only if explicitly configured)
It is important to know that JWTs are not encrypted, only signed. This means that outsiders cannot create new JWTs or modify existing ones, but they are able to read out the username.
Accept externally generated JWTs provided via cookie
If there is an external service like an AuthAPI that can generate JWTs and hand them over to ClusterCockpit via cookies, CC can be configured to accept them:
.env
: CC needs a public ed25519 key to verify foreign JWT signatures. Public keys in PEM format can be converted with the instructions in /tools/convert-pem-pubkey-for-cc .
CROSS_LOGIN_JWT_PUBLIC_KEY="+51iXX8BdLFocrppRxIw52xCOf8xFSH/eNilN5IHVGc="
config.json
: Insert a name for the cookie (set by the external service) containing the JWT so that CC knows where to look at. Define a trusted issuer (JWT claim ‘iss’), otherwise it will be rejected. If you want usernames and user roles from JWTs (‘sub’ and ‘roles’ claim) to be validated against CC’s internal database, you need to enable it here. Unknown users will then be rejected and roles set via JWT will be ignored.
"jwts": {
"cookieName": "access_cc",
"forceJWTValidationViaDatabase": true,
"trustedExternalIssuer": "auth.example.com"
}
- Make sure your external service includes the same issuer (
iss
) in its JWTs. Example JWT payload:
{
"iat": 1668161471,
"nbf": 1668161471,
"exp": 1668161531,
"sub": "alice",
"roles": [
"user"
],
"jti": "a1b2c3d4-1234-5678-abcd-a1b2c3d4e5f6",
"iss": "auth.example.com"
}
5 - Metric Store
Introduction
CCMS (Cluster Cockpit Metric Store) is a simple in-memory time series database. It stores the data about the nodes in your cluster for a specific interval of days. Data about your nodes can be collected with various instrumentation tools like RAPL, LIKWID, PAPI etc. Instrumentation tools can collect data like memory bandwidth, flops, clock frequency, CPU usage etc. After a specified number of days, the data from the in-memory database will be written to disk, archived and released from the in-memory database. In this documentation, we will explain in-detail working of the CCMS components and the outline of the documentation is as follows:
- Present the structure of the metric store.
- Explain background workers.
Let us get started with the very basic understanding of how CCMS is structured and how it manages data over time.
General tree structure can be as follows:
root
|-----cluster
| |------node -> [node-metrics]
| | |--components -> [node-level-metrics]
| | |--components -> [node-level-metrics]
| |
| |------node -> [node-metrics]
| |--components -> [node-level-metrics]
| |--components -> [node-level-metrics]
|
|-----cluster
|-----node -> [node-metrics]
| |--components -> [node-level-metrics]
| |--components -> [node-level-metrics]
|
|-----node -> [node-metrics]
|--components -> [node-level-metrics]
|--components -> [node-level-metrics]
A simple tree representation with example:
root
|-----alex
| |------a903 -> [mem_cached,cpu_idle,nfs4_read]
| | |--hwthread01 -> [cpu_load,cpu_user,flops_any]
| | |--accelerator01 -> [mem_bw,mem_used,flops_any]
| |
| |------a322 -> [mem_cached,cpu_idle,nfs4_read]
| |--hwthread42 -> [cpu_load,cpu_user,flops_any]
| |--accelerator05 -> [mem_bw,mem_used,flops_any]
|
|-----fritz
|-----f104 -> [mem_cached,cpu_idle,nfs4_read]
| |--hwthread35 -> [cpu_load,cpu_user,flops_any]
| |--socket02 -> [cpu_load,cpu_user,flops_any]
|
|-----f576 -> [mem_cached,cpu_idle,nfs4_read]
|--hwthread47 -> [cpu_load,cpu_user,flops_any]
|--cpu01 -> [cpu_load,cpu_user,flops_any]
Example tree structure of CCMS containing 2 clusters ‘alex’ and ‘fritz’ that contains each of its own nodes and each node contains its components. Each node and its component contains metrics. a903 is an example of a node and hwthread01 & accelerator01 is a node-level component. Each node will have its own metrics as well as node-level components will also have their own metrics i.e. node-level-metrics.
Internal data structures used in cc-metric-store
A representation of the Level and Buffer data structure with the buffer chain.
From our previous example, we move from a simplistic view to a more realistic view. Each buffer for the given metric holds up to BUFFER_CAP elements in its data array. Usually the BUFFER_CAP is 512 elements, so for float64 elements, the buffer size is 4KB, which is also the size of the page in general. Below you can find all the data structures and its associated member variables. In our example, the start time in buffer is exactly 512 epoch seconds apart. Older buffers are pushed to the previous of the new buffer. This creates a chain of buffers for every level.
Data structure used to hold the data in memory:
- MemoryStore
MemoryStore struct {
// Parses and stores the metrics from config.json
Metrics HashMap[string][MetricConfig]
// Initial root level.
root Level
}
- Level
// From our example, alex, fritz, a903, a322, hwthreads01 are all of Level data stucture.
Level struct {
// Stores the metrics for the level.
// From our example, mem_cached, flops_any are of Buffer data structure.
metrics []*buffer
// Stores
children HashMap[string][*Level]
}
- Buffer
buffer struct {
// Pointer to previous buffer
prev *buffer
// Pointer to next buffer
next *buffer
// Array of floats to store
// Interval in seconds at which measurements will arive.
frequency int64
// Buffer's start time stored in epoch seconds
start int
// If true, this buffer will be skipped for file checkpointing
archived bool
closed bool
}
- MetricConfig
MetricConfig struct {
// Interval in seconds at which measurements will arive.
// frequency of 60 means the the timestep/resolution is 60 seconds.
Frequency int
// Can be 'sum', 'avg' or null. Describes how to aggregate metrics from the same timestep over the hierarchy.
Aggregation String
// Private, used internally...
Offset int
}
Background workers
Background workers are separate threads spawned for each background task like:
Data retention -> This background worker uses
retention-on-memory
parameter in theconfig.json
and sets a looping interval for the user-given time. It ticks until the given interval is reached and then releases all the Buffers in CCMS which are less than the user-given time.
In this example, we assume that we insert data continuously in CCMS with retention period of 48 hrs. So the background worker will always check with an interval of retention-period/2. In the example, it is necessary to check every 24 hrs so that the CCMS can retain data of 48 hrs overall. Once it reaches 72 hrs, background worker releases the first 24 hours of data from the in-memory database.
- Data check pointing -> This background worker uses
interval
from thecheckpoints
parameter in theconfig.json
and sets a looping interval for the user-given time. It ticks until the given interval is reached and creates local backups of the data from the CCMS to the disk. The check pointed files can be found at the user-defineddirectory
sub-parameter from thecheckpoints
parameter in theconfig.json
file. Check pointing does not mean removing the data from the in-memory database. The data from the memory will only be released until retention period is reached. - Data archiving -> This background worker uses
interval
from thearchive
parameter in theconfig.json
and sets a looping interval for the user-given time. It ticks until the given interval is reached and zips all the checkpointed files which are before the user-given time in theinterval
sub-parameter. Once the checkpointed files are zipped, they are deleted from the checkpointing directory. - Graceful shutdown handler -> This is a special background worker that detects system or keyboard interrupts like Ctrl+C or Ctrl+Z. In case of an interrupt, it is essential to save the data from the in-memory database. There can be a case when the CCMS contains data just in the memory and it has not been checkpointed. So this background worker scans for the Buffers that have not been checkpointed and writes them to the checkpoint files before shutting down the CCMS.
Reusing the buffers in cc-metric-store
This section explain how CCMS handles the buffer re usability once the buffers are released by the retention background worker.
In this example, we extend the previous example and assume that the retention background worker releases every last buffer from each level i.e. node and node-level metrics. Each buffer that is about to be unlinked from the buffer chain will not be freed from memory, but instead will be unlinked and stored in the memory pool as shown. This allow buffer reusability whenever the buffers reaches the BUFFER_CAP limit and each metric requests new buffers.
6 - Roles
ClusterCockpit uses a specified set of user roles to steer data access and discriminate authorizations, primarily used in the web interface for different display of views, but also limiting data access when requsts return from the server backend.
The roles currently implemented are:
User Role
The standard role for all users. By default, granted to all users imported from LDAP. It is also the default selection for the administrative “Create User” form.
Use Case: View and list personal jobs, view personal job detail, inspect metrics of personal jobs.
Access: Jobs started from the users account only.
Manager Role
A privileged role for project supervisors. This role has to be granted manually by administrators. If ClusterCockpit is configured to accept JWT logins from external management applications, it is possible to retain roles granted in the respective application, see JWT docs.
In addition to the role itself, one ore more projects need to be assigned to the user by administrators.
Use Case: In addition to personal job access, this role is intended to view and inspect all jobs of all users of the assigned projects (usergroups), in order to self-manage and identify problems of the subordinate user group.
Access: Personally started jobs, regardless of project. Additionally, all jobs started from all users of the assigned projects (usergroups).
Support Role
A privileged role for support staff. This role has to be granted manually by administrators. If ClusterCockpit is configured to accept JWT logins from external management applications, it is possible to retain roles granted in the respective application, see JWT docs.
In regard to job view access, this role is identical to administrators. However, webinterface view access differs and, most importantly, acces to administrative options is prohibited.
Use Case: In addition to personal job access, this role is intended to view and inspect all jobs of all users active on the clusters, in order to identify problems and give guidance for the userbase as a whole, supporting the administrative staff in these tasks.
Access: Personally started jobs, regardless of project. Additionally, all jobs started from all users on all configured clusters.
Administrator Role
The highest available authority for administrative staff only. This role has to be granted manually by other administrators. No JWT can ever grant this role.
All jobs from all active users on all systems can be accessed, as well as all webinterface views. In addition, the administrative options in the settings view are accessible.
Use Case: General access and ClusterCockpit administrative tasks from the settings page.
Access: General access.
API Role
An optional, technical role given to users in order to enable usage of the RESTful API endpoints. This role has to be granted manually by administrators. No JWT can ever grant this role.
This role can either be granted to a specialized “API User”, which does not have a password or any other roles, and therefore, can not log in by itself. Such an user is only intended to be used to generate JWT access tokens for scripted API access, for example.
Still, this role can be granted to actual users, for example, administrators to generate personal API tokens for testing.
Use Case: Interact with ClusterCockpits’ REST API.
Access: Allows usage of ClusterCockpits’ REST API.