This is the multi-page printable view of this section. Click here to print.
Tutorials
1 -
2 - Hands-On Demo
Prerequisites
- perl
- go
- npm
- Optional: curl
- Script migrateTimestamp.pl
Documentation
You find READMEs or api docs in
- ./cc-backend/configs
- ./cc-backend/init
- ./cc-backend/api
ClusterCockpit configuration files
cc-backend
./.env
Passwords and Tokens set in the environment./config.json
Configuration options for cc-backend
cc-metric-store
./config.json
Optional to overwrite configuration options
cc-metric-collector
Not yet included in the hands-on setup.
Setup Components
Start by creating a base folder for all of the following steps.
mkdir clustercockpit
cd clustercockpit
Setup cc-backend
- Clone Repository
git clone https://github.com/ClusterCockpit/cc-backend.git
cd cc-backend
- Build
make
- Activate & configure environment for cc-backend
cp configs/env-template.txt .env
- Optional: Have a look via
vim .env
- Copy the
config.json
file included in this tarball into the root directory of cc-backend:cp ../../config.json ./
- Back to toplevel
clustercockpit
cd ..
- Prepare Datafolder and Database file
mkdir var
./cc-backend -migrate-db
Setup cc-metric-store
- Clone Repository
git clone https://github.com/ClusterCockpit/cc-metric-store.git
cd cc-metric-store
- Build Go Executable
go get
go build
- Prepare Datafolders
mkdir -p var/checkpoints
mkdir -p var/archive
- Update Config
vim config.json
- Exchange existing setting in
metrics
with the following:
"clock": { "frequency": 60, "aggregation": null },
"cpi": { "frequency": 60, "aggregation": null },
"cpu_load": { "frequency": 60, "aggregation": null },
"flops_any": { "frequency": 60, "aggregation": null },
"flops_dp": { "frequency": 60, "aggregation": null },
"flops_sp": { "frequency": 60, "aggregation": null },
"ib_bw": { "frequency": 60, "aggregation": null },
"lustre_bw": { "frequency": 60, "aggregation": null },
"mem_bw": { "frequency": 60, "aggregation": null },
"mem_used": { "frequency": 60, "aggregation": null },
"rapl_power": { "frequency": 60, "aggregation": null }
- Back to toplevel
clustercockpit
cd ..
Setup Demo Data
mkdir source-data
cd source-data
- Download JobArchive-Source:
wget https://hpc-mover.rrze.uni-erlangen.de/HPC-Data/0x7b58aefb/eig7ahyo6fo2bais0ephuf2aitohv1ai/job-archive-dev.tar.xz
tar xJf job-archive-dev.tar.xz
mv ./job-archive ./job-archive-source
rm ./job-archive-dev.tar.xz
- Download CC-Metric-Store Checkpoints:
mkdir -p cc-metric-store-source/checkpoints
cd cc-metric-store-source/checkpoints
wget https://hpc-mover.rrze.uni-erlangen.de/HPC-Data/0x7b58aefb/eig7ahyo6fo2bais0ephuf2aitohv1ai/cc-metric-store-checkpoints.tar.xz
tar xf cc-metric-store-checkpoints.tar.xz
rm cc-metric-store-checkpoints.tar.xz
- Back to
source-data
cd ../..
- Run timestamp migration script. This may take tens of minutes!
cp ../migrateTimestamps.pl .
./migrateTimestamps.pl
- Expected output:
Starting to update start- and stoptimes in job-archive for emmy
Starting to update start- and stoptimes in job-archive for woody
Done for job-archive
Starting to update checkpoint filenames and data starttimes for emmy
Starting to update checkpoint filenames and data starttimes for woody
Done for checkpoints
- Copy
cluster.json
files from source to migrated folderscp source-data/job-archive-source/emmy/cluster.json cc-backend/var/job-archive/emmy/
cp source-data/job-archive-source/woody/cluster.json cc-backend/var/job-archive/woody/
- Initialize Job-Archive in SQLite3 job.db and add demo user
cd cc-backend
./cc-backend -init-db -add-user demo:admin:demo
- Expected output:
<6>[INFO] new user "demo" created (roles: ["admin"], auth-source: 0)
<6>[INFO] Building job table...
<6>[INFO] A total of 3936 jobs have been registered in 1.791 seconds.
- Back to toplevel
clustercockpit
cd ..
Startup both Apps
- In cc-backend root:
$./cc-backend -server -dev
- Starts Clustercockpit at
http:localhost:8080
- Log:
<6>[INFO] HTTP server listening at :8080...
- Log:
- Use local internet browser to access interface
- You should see and be able to browse finished Jobs
- Metadata is read from SQLite3 database
- Metricdata is read from job-archive/JSON-Files
- Create User in settings (top-right corner)
- Name
apiuser
- Username
apiuser
- Role
API
- Submit & Refresh Page
- Name
- Create JTW for
apiuser
- In Userlist, press
Gen. JTW
forapiuser
- Save JWT for later use
- In Userlist, press
- Starts Clustercockpit at
- In cc-metric-store root:
$./cc-metric-store
- Start the cc-metric-store on
http:localhost:8081
, Log:
- Start the cc-metric-store on
2022/07/15 17:17:42 Loading checkpoints newer than 2022-07-13T17:17:42+02:00
2022/07/15 17:17:45 Checkpoints loaded (5621 files, 319 MB, that took 3.034652s)
2022/07/15 17:17:45 API http endpoint listening on '0.0.0.0:8081'
- Does not have a graphical interface
- Otpional: Test function by executing:
$ curl -H "Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJFZERTQSJ9.eyJ1c2VyIjoiYWRtaW4iLCJyb2xlcyI6WyJST0xFX0FETUlOIiwiUk9MRV9BTkFMWVNUIiwiUk9MRV9VU0VSIl19.d-3_3FZTsadPjDEdsWrrQ7nS0edMAR4zjl-eK7rJU3HziNBfI9PDHDIpJVHTNN5E5SlLGLFXctWyKAkwhXL-Dw" -D - "http://localhost:8081/api/query" -d "{ \"cluster\": \"emmy\", \"from\": $(expr $(date +%s) - 60), \"to\": $(date +%s), \"queries\": [{
\"metric\": \"flops_any\",
\"host\": \"e1111\"
}] }"
HTTP/1.1 200 OK
Content-Type: application/json
Date: Fri, 15 Jul 2022 13:57:22 GMT
Content-Length: 119
{"results":[[JSON-DATA-ARRAY]]}
Development API web interfaces
The -dev
flag enables web interfaces to document and test the apis:
- Local GQL Playgorund - A GraphQL playground. To use it you must have a authenticated session in the same browser.
- Local Swagger Docs - A Swagger UI. To use it you have to be logged out, so no user session in the same browser. Use the JWT token with role Api generate previously to authenticate via http header.
Use cc-backend API to start job
Enter the URL
http://localhost:8080/swagger/index.html
in your browser.Enter your JWT token you generated for the API user by clicking the green Authorize button in the upper right part of the window.
Click the
/job/start_job
endpoint and click the Try it out button.Enter the following json into the request body text area and fill in a recent start timestamp by executing
date +%s
.:
{
"jobId": 100000,
"arrayJobId": 0,
"user": "ccdemouser",
"subCluster": "main",
"cluster": "emmy",
"startTime": <date +%s>,
"project": "ccdemoproject",
"resources": [
{"hostname": "e0601"},
{"hostname": "e0823"},
{"hostname": "e0337"},
{"hostname": "e1111"}],
"numNodes": 4,
"numHwthreads": 80,
"walltime": 86400
}
- The response body should be the database id of the started job, for example:
{
"id": 3937
}
- Check in ClusterCockpit
- User
ccdemouser
should appear in Users-Tab with one running job - It could take up to 5 Minutes until the Job is displayed with some current data (5 Min Short-Job Filter)
- Job then is marked with a green
running
tag - Metricdata displayed is read from cc-metric-store!
- User
Use cc-backend API to stop job
- Enter the URL
http://localhost:8080/swagger/index.html
in your browser. - Enter your JWT token you generated for the API user by clicking the green Authorize button in the upper right part of the window.
- Click the
/job/stop_job/{id}
endpoint and click the Try it out button. - Enter the database id at id that was returned by
start_job
and copy the following into the request body. Replace the timestamp with a recent one:
{
"cluster": "emmy",
"jobState": "completed",
"stopTime": <RECENT TS>
}
On success a json document with the job meta data is returned.
Check in ClusterCockpit
- User
ccdemouser
should appear in Users-Tab with one completed job - Job is no longer marked with a green
running
tag -> Completed! - Metricdata displayed is now read from job-archive!
- User
Check in job-archive
cd ./cc-backend/var/job-archive/emmy/100/000
cd $STARTTIME
- Inspect
meta.json
anddata.json
Helper scripts
- In this tarball you can find the perl script
generate_subcluster.pl
that helps to generate the subcluster section for your system. Usage: - Log into an exclusive cluster node.
- The LIKWID tools likwid-topology and likwid-bench must be in the PATH!
$./generate_subcluster.pl
outputs the subcluster section onstdout
Please be aware that
- You have to enter the name and node list for the subCluster manually.
- GPU detection only works if LIKWID was build with Cuda avalable and you run likwid-topology also with Cuda loaded.
- Do not blindly trust the measured peakflops values.
- Because the script blindly relies on the CSV format output by likwid-topology this is a fragile undertaking!
3 - How to add notification banner
Overview
To add a notification banner you can add a file notice.txt
to the ./var
directory of the cc-backend
server. As long as this file is present all text
in this file is shown in an info banner on the homepage.
Add notification banner in web interface
As an alternative the admin
role can also add and edit the notification banner
from the settings view.
4 - How to customize cc-backend
Overview
Customizing cc-backend
means changing the logo, legal texts, and the login
template instead of the placeholders. You can also place a text file in ./var
to add dynamic status or notification messages to the ClusterCockpit homepage.
Replace legal texts
To replace the imprint.tmpl
and privacy.tmpl
legal texts, you can place your
version in ./var/
. At startup cc-backend
will check if ./var/imprint.tmpl
and/or
./var/privacy.tmpl
exist and use them instead of the built-in placeholders.
You can use the placeholders in web/templates
as a blueprint.
Replace login template
To replace the default login layout and styling, you can place your version in
./var/
. At startup cc-backend
will check if ./var/login.tmpl
exist and use
it instead of the built-in placeholder. You can use the default template
web/templates/login.tmpl
as a blueprint.
Replace logo
To change the logo displayed in the navigation bar, you can provide the file
logo.png
in the folder ./var/img/
. On startup cc-backend
will check if the
folder exists and use the images provided there instead of the built-in images.
You may also place additional images there you use in a custom login template.
Add notification banner on homepage
To add a notification banner you can add a file notice.txt
to ./var
. As long
as this file is present all text in this file is shown in an info banner on the
homepage.
5 - How to deploy and update cc-backend
Workflow for deployment
Why we do not provide a docker container
The ClusterCockpit web backend binary has no external dependencies, everything is included in the binary. The external assets, SQL database and job archive, would also be external in a docker setup. The only advantage of a docker setup would be that the initial configuration is automated. But this only needs to be done one time. We therefore think that setting up docker, securing and maintaining it is not worth the effort.It is recommended to install all ClusterCockpit components in a common directory, e.g. /opt/monitoring
, var/monitoring
or var/clustercockpit
.
In the following we use /opt/monitoring
.
Two systemd services run on the central monitoring server:
- clustercockpit : binary cc-backend in
/opt/monitoring/cc-backend
. - cc-metric-store : Binary cc-metric-store in
/opt/monitoring/cc-metric-store
.
ClusterCockpit is deployed as a single binary that embeds all static assets.
We recommend keeping all cc-backend
binary versions in a folder archive
and
linking the currently active one from the cc-backend
root.
This allows for easy roll-back in case something doesn’t work.
Please Note
cc-backend
is started with root rights to open the privileged ports (80 and
443). It is recommended to set the configuration options user
and group
, in
which case cc-backend
will drop root permissions once the ports are taken.
You have to take care, that the ownership of the ./var
folder and
its contents are set accordingly.Workflow to update
This example assumes the DB and job archive versions did not change. In case the new binary requires a newer database or job archive version read here how to migrate to newer versions.
- Stop systemd service:
sudo systemctl stop clustercockpit.service
- Backup the sqlite DB file! This is as simple as to copy it.
- Copy new
cc-backend
binary to/opt/monitoring/cc-backend/archive
(Tip: Use a date tag likeYYYYMMDD-cc-backend
). Here is an example:
cp ~/cc-backend /opt/monitoring/cc-backend/archive/20231124-cc-backend
- Link from
cc-backend
root to current version
ln -s /opt/monitoring/cc-backend/archive/20231124-cc-backend /opt/monitoring/cc-backend/cc-backend
- Start systemd service:
sudo systemctl start clustercockpit.service
- Check if everything is ok:
sudo systemctl status clustercockpit.service
- Check log for issues:
sudo journalctl -u clustercockpit.service
- Check the ClusterCockpit web frontend and your Slurm adapters if anything is broken!
6 - How to generate JWT tokens
Overview
ClusterCockpit uses JSON Web Tokens (JWT) for authorization of its APIs. JWTs are the industry standard for securing APIs and is also used for example in OAuth2. For details on JWTs refer to the JWT article in the Concepts section.
When a user logs in via the /login
page using a browser, a session cookie
(secured using the random bytes in the SESSION_KEY
env variable you should
change as well in production) is used for all requests after the successful
login. The JWTs make it easier to use the APIs of ClusterCockpit using scripts
or other external programs. The token is specified n the Authorization
HTTP
header using the Bearer schema
(there is an example below). Tokens can be issued to users from the
configuration view in the Web-UI or the command line (using the -jwt <username>
option). In order to use the token for API endpoints such as
/api/jobs/start_job/
, the user that executes it needs to have the api
role.
Regular users can only perform read-only queries and only look at data connected
to jobs they started themselves.
There are two usage scenarios:
- The APIs are used during a browser session. API accesses are authorized with the active session.
- The REST API is used outside a browser session, e.g. by scripts. In this case
you have to issue a token manually. This possible from within the
configuration view or on the command line. It is recommended to issue a JWT
token in this case for a special user that only has the
api
role. By using different users for different purposes a fine grained access control and access revocation management is possible.
The token is commonly specified in the Authorization HTTP header using the
Bearer schema. ClusterCockpit uses a ECDSA private/public keypair to sign and
verify its tokens. You can use cc-backend
to generate new JWT tokens.
Workflow
Create a new ECDSA Public/private key pair for signing and validating tokens
We provide a small utility tool as part of cc-backend
:
go build ./cmd/gen-keypair/
./gen-keypair
Add key pair in your .env
file for cc-backend
An env file template can be found in ./configs
.
cc-backend
requires the private key to sign newly generated JWT tokens and the
public key to validate tokens used to authenticate in its REST APIs.
Generate new JWT token
Every user with the admin role can create or change a user in the configuration view of the web interface. To generate a new JWT for a user just press the GenJWT button behind the user name in the user list.
A new api user and corresponding JWT keys can also be generated from the command line.
Create new API user with admin and api role:
./cc-backend -add-user myapiuser:admin,api:<password>
Create a new JWT token for this user:
./cc-backend -jwt myapiuser
Use issued token token on client side
curl -X GET "<API ENDPOINT>" -H "accept: application/json" -H "Content-Type: application/json" -H "Authorization: Bearer <JWT TOKEN>"
This token can be used for the cc-backend
REST API as well as for the
cc-metric-store
. If you use the token for cc-metric-store
you have to
configure it to use the corresponding public key for validation in its
config.json.
Note
Per default the JWT tokens generated by cc-backend will not expire! To set an expiration date you have to configure an expiration duration inconfig.json
.
You find details here,
use keys jwts
:max-age
.Of course the JWT token can be generated also by other means as long it is
signed with a ED25519 private key and the corresponding public key is configured
in cc-backend
or cc-metric-store
. For the claims that are set and used by
ClusterCockpit refer to the JWT article.
cc-metric-store
The cc-metric-store also
uses JWTs for authentication. As it does not issue new tokens, it does not need
to kown the private key. The public key of the keypair that is used to generate
the JWTs that grant access to the cc-metric-store
can be specified in its
config.json
. When configuring the metricDataRepository
object in the
cluster.json
file of the job-archive, you can put a token issued by
cc-backend
itself.
7 - How to regenerate the Swagger UI documentation
Overview
This project integrates swagger ui to
document and test its REST API. The swagger documentation files can be found in
./api/
.
Note
To regenerate the Swagger UI files is only required if you change the files./internal/api/rest.go
. Otherwise the Swagger UI will already be correctly
build and is ready to use.Generate Swagger UI files
You can generate the swagger-ui configuration by running the following command from the cc-backend root directory:
go run github.com/swaggo/swag/cmd/swag init -d ./internal/api,./pkg/schema -g rest.go -o ./api
You need to move one generated file:
mv ./api/docs.go ./internal/api/docs.go
Finally rebuild cc-backend
:
make
Use the Swagger UI web interface
If you start cc-backend with the -dev
flag, the Swagger web interface is available
at http://localhost:8080/swagger/.
To use the Try Out functionality, e.g. to test the REST API, you must enter a JWT
key for a user with the API role.
Info
The user who owns the JWT key must not be logged into the same browser (have a valid session), or the Swagger requests will not work. It is recommended to create a separate user that has only the API role.8 - How to setup a systemd service
How to run as a systemd service.
The files in this directory assume that you install ClusterCockpit to
/opt/monitoring/cc-backend
.
Of course you can choose any other location, but make sure you replace all paths
starting with /opt/monitoring/cc-backend
in the clustercockpit.service
file!
The config.json
may contain the optional fields user and group. If
specified, the application will call
setuid and
setgid after reading the
config file and binding to a TCP port (so it can take a privileged port), but
before it starts accepting any connections. This is good for security, but also
means that the var/
directory must be readable and writeable by this user.
The .env
and config.json
files may contain secrets and should not be
readable by this user. If these files are changed, the server must be restarted.
- Clone this repository somewhere in your home
git clone git@github.com:ClusterCockpit/cc-backend.git
- (Optional) Install dependencies and build. In general it is recommended to use the provided release binaries.
cd cc-backend && make
Copy the binary to the target folder (adapt if necessary):
sudo mkdir -p /opt/monitoring/cc-backend/
cp ./cc-backend /opt/monitoring/cc-backend/
- Modify the
config.json
andenv-template.txt
file from theconfigs
directory to your liking and put it in the target directory
cp ./configs/config.json /opt/monitoring/config.json && cp ./configs/env-template.txt /opt/monitoring/.env
vim /opt/monitoring/config.json # do your thing...
vim /opt/monitoring/.env # do your thing...
- (Optional) Customization: Add your versions of the login view, legal texts, and logo image. You may use the templates in
./web/templates
as blueprint. Every overwrite is separate.
cp login.tmpl /opt/monitoring/cc-backend/var/
cp imprint.tmpl /opt/monitoring/cc-backend/var/
cp privacy.tmpl /opt/monitoring/cc-backend/var/
# Ensure your logo, and any images you use in your login template has a suitable size.
cp -R img /opt/monitoring/cc-backend/img
- Copy the systemd service unit file. You may adopt it to your needs.
sudo cp ./init/clustercockpit.service /etc/systemd/system/clustercockpit.service
- Enable and start the server
sudo systemctl enable clustercockpit.service # optional (if done, (re-)starts automatically)
sudo systemctl start clustercockpit.service
Check whats going on:
sudo systemctl status clustercockpit.service
sudo journalctl -u clustercockpit.service
9 - How to use the Swagger UI documentation
Overview
This project integrates swagger ui to
document and test its REST API.
./api/
.
Access the Swagger UI web interface
If you start cc-backend with the -dev
flag, the Swagger web interface is available
at http://localhost:8080/swagger/.
To use the Try Out functionality, e.g. to test the REST API, you must enter a JWT
key for a user with the API role.
Info
The user who owns the JWT key must not be logged into the same browser (have a valid session), or the Swagger requests will not work. It is recommended to create a separate user that has only the API role.10 - Migration
Introduction
In general, an upgrade is nothing more than a replacement of the binary file. All the necessary files, except the database file, the configuration file and the job archive, are embedded in the binary file. It is recommended to use a directory where the file names of the binary files are named with a version indicator. This can be, for example, the date or the Unix epoch time. A symbolic link points to the version to be used. This makes it easier to switch to earlier versions.
The database and the job archive are versioned. Each release binary supports specific versions of the database and job archive. If a version mismatch is detected, the application is terminated and migration is required.
IMPORTANT NOTEIt is recommended to make a backup copy of the database before each update. This
is mandatory in case the database needs to be migrated. In the case of sqlite,
this means to stopping cc-backend
and copying the sqlite database file
somewhere.
Migrating the database
After you have backed up the database, run the following command to migrate the database to the latest version:
> ./cc-backend -migrate-db
The migration files are embedded in the binary and can also be viewed in the cc backend source tree. There are separate migration files for both supported database backends. We use the migrate library.
If something goes wrong, you can check the status and get the current schema (here for sqlite):
> sqlite3 var/job.db
In the sqlite console execute:
.schema
to get the current databse schema. You can query the current version and whether the migration failed with:
SELECT * FROM schema_migrations;
The first column indicates the current database version and the second column is a dirty flag indicating whether the migration was successful.
Migrating the job archive
Job archive migration requires a separate tool (archive-migration
), which is
part of the cc-backend source tree (build with go build ./tools/archive-migration
)
and is also provided as part of the releases.
Migration is supported only between two successive releases. The migration tool migrates the existing job archive to a new job archive. This means that there must be enough disk space for two complete job archives. If the tool is called without options:
> ./archive-migration
it is assumed that a job archive exists in ./var/job-archive
. The new job
archive is written to ./var/job-archive-new
. Since execution is threaded in case
of a fatal error, it is impossible to determine in which job the error occurred.
In this case, you can run the tool in debug mode (with the -debug
flag). In
debug mode, threading is disabled and the job ID of each migrated job is output.
Jobs with empty files will be skipped. Between multiple runs of the tools, the
job-archive-new
directory must be moved or deleted.
The cluster.json
files in job-archive-new
must be checked for errors, especially
whether the aggregation attribute is set correctly for all metrics.
Migration takes several hours for relatively large job archives (several hundred GB). A versioned job archive contains a version.txt file in the root directory of the job archive. This file contains the version as an unsigned integer.