Skip to content

Release/v0.3.1 #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,14 +48,14 @@ If you have a suggestion for the project, we'd love to hear about it. Please inc
### Coding Standards

* Use consistent code formatting
* Write clear commit messages following [Conventional Commits](https://www.conventionalcommits.org/)
* Write clear commit messages following [Conventional Commits](https://www.conventionalcommits.org/) or at least the basic specification as in the [Commit Messages](#commit-messages) section.
* Comment your code where necessary
* Write tests for new features
* Keep the code simple and maintainable

### Commit Messages

We follow a basic specification:
Basic specification example:

```
type(scope): description
Expand Down
6 changes: 4 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,6 @@ RUN touch /etc/apt/sources.list
# Debian strech moved to archived
RUN echo "deb https://debian.mirror.garr.it/debian-archive/ stretch main" > /etc/apt/sources.list

# Update repositories
RUN apt-get -y update

# Install dependencies
RUN apt-get update && apt-get install -y \
Expand All @@ -25,6 +23,7 @@ RUN apt-get update && apt-get install -y \
&& rm -rf /var/lib/apt/lists/*

# copy app
ADD ./publishers/random_pub ${EXAMON_HOME}/publishers/random_pub
ADD ./lib/examon-common $EXAMON_HOME/lib/examon-common
ADD ./docker/examon/supervisor.conf /etc/supervisor/conf.d/supervisor.conf
ADD ./scripts/examon.conf $EXAMON_HOME/scripts/examon.conf
Expand All @@ -38,6 +37,9 @@ WORKDIR $EXAMON_HOME/lib/examon-common
RUN $PIP install .
RUN pip install .

WORKDIR $EXAMON_HOME/publishers/random_pub
RUN $PIP install -r requirements.txt

WORKDIR $EXAMON_HOME/web
RUN virtualenv flask
RUN flask/bin/pip --trusted-host pypi.python.org install --upgrade pip==20.1.1
Expand Down
169 changes: 159 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,17 @@
# Examon HPC Monitoring
<p align="center">
<img src="https://github.com/fbeneventi/panels/raw/main/logo3_trasp.png" alt="ExaMon" width="40%">
</p>


# ExaMon HPC Monitoring

[![Build Status](https://github.com/ExamonHPC/examon/actions/workflows/installation-test.yml/badge.svg?branch=develop)](https://github.com/ExamonHPC/examon/actions/workflows/installation-test.yml)

A highly scalable framework for the performance and energy monitoring of HPC servers

📖 [Documentation](https://examonhpc.github.io/examon/)


## Setup

This setup will install all server-side components of the ExaMon framework:
Expand All @@ -12,6 +20,14 @@ This setup will install all server-side components of the ExaMon framework:
- Grafana
- KairosDB
- Cassandra
- Example plugins

This Examon installation includes the following plugins:

- `random_pub`

Please note: the random_pub plugin is used to test the system and it will publish random metrics.
It can be disabled as described in the [Enable/disable plugins](#enabledisable-the-plugins) section.

## Prerequisites
Since Cassandra is the component that requires the majority of resources, you can find more details about the suggested hardware configuration of the system that will host the services here:
Expand Down Expand Up @@ -42,37 +58,170 @@ docker compose up -d

This will build the Docker images and fetch some prebuilt images and then start the services. You can refer to the `docker-compose.yml` file to see the full configuration.

## Configuration

### Configure Grafana

Log in to the Grafana server using your browser and the default credentials:

**NOTE:** This installation sets the default password to `GF_SECURITY_ADMIN_PASSWORD` in the `docker-compose.yml` file.

http://localhost:3000

Follow the normal procedure for adding a new data source:

[Add a Datasource](https://grafana.com/docs/grafana/latest/datasources/add-a-data-source/)

From the Grafana UI, add a new data source and select `KairosDB`.

Fill out the form with the following settings:

- Type: `KairosDB`
- Name: `kairosdb`
- Url: http://kairosdb:8083
- Access: `Server`

## Usage Examples
To import the dashboards stored in the `dashboards/` folder:

[Import dashboard](https://grafana.com/docs/grafana/latest/dashboards/export-import/#import-dashboard)

To test the installation, you can import the `Examon Test - Random Sensor.json` dashboard.


### Configure the plugins

Installing the ExaMon plugins requires the configuration of each individual component.

It is necessary to define all the properties of the `.conf` configuration file of the plugins
with the appropriate values related to the server hosting the framework. In particular, it is necessary
to define the IP addresses and ports of the server where the KairosDB and/or MQTT broker services run,
as well as their credentials.
The configuration files to be edited are located in the respective plugin folders contained in the
following folders:

| Plugin | Path |
|-----------------|-----------------------------|
| random_pub | `/publishers/random_pub` |

Please refer to the respective plugin readme file (*Configuration* section) for further details.


### Manage the plugins

The plugins are managed by supervisord, which is the microservices manager for the examon container.

The majority of the commands follow the supervisorctl syntax:

```bash
supervisorctl <command> <plugin-name>
```

The most used commands are:

- `start`
- `stop`
- `restart`
- `status`
- `tail`

to see the full list of commands, you can use the following command:

```bash
docker exec -it <examon-container-name> supervisorctl help
```

To start the plugins, you need to run the following command:

```bash
docker exec -it <examon-container-name> supervisorctl start <plugin-name>
```
Example:

### Collecting data using the dummy "examon_pub" plugin
Once all Docker services are running (can be started either by `docker-compose up -d` or `docker-compose start`), the MQTT broker is available at `TEST_SERVER` port `1883` where `TEST_SERVER` is the address of the server where the services run.
```bash
docker exec -it examon supervisorctl start plugins:random_pub
```

To test the installation we can use the `examon_pub.py` plugin available in the `publishers/examon_pub` folder of this project.
Or, if you want to start all the plugins, you can use the following command:

It is highly recommended to follow the tutorial described in the Jupyter notebook `README-notebook.ipynb` to understand how an Examon plugin works.
```bash
docker exec -it <examon-container-name> supervisorctl start plugins:*
```
As an alternative, you can open the supervisor shell to manage the plugins and start/stop them individually:

```bash
docker exec -it <examon-container-name> supervisorctl
```

After having installed and configured it on one or more test nodes we can start the data collection running for example:
### Check the logs

To check the logs of the plugins, you can use the following command:

```bash
docker exec -it <examon-container-name> supervisorctl tail [-f] <plugin-name>
```

### Enable/disable the plugins

Some plugins may be disabled by default and need to be started manually each time the examon container is started.

To enable and start the plugins automatically, you need to edit the supervisor configuration file for the examon service.

```bash
[root@testnode00]$ python ./examon_pub.py -b TEST_SERVER -p 1883 -s 1 run
docker exec -it <examon-container-name> bash

vi /etc/supervisor/conf.d/supervisor.conf
```
If everything went well, the data are available both through the Grafana interface and using the `examon-client`.
Then, for each plugin, set the following parameters to true:

```bash
autostart=True
```
Restart the examon container to apply the changes:

```bash
docker restart <examon-container-name>
```
Please note that the supervisor configuration will be lost in case the container is recreated.
To make the settings persistent, you need to edit the supervisor configuration file in `docker/examon/supervisor.conf` and rebuild.

## Examon server configuration

The Examon server must be enabled in the supervisor configuration file and configured to use the Examon REST API.

Please refer to the `README.rst` file in the `web/examon-server` folder for more information.

**NOTE:** The Cassandra related settings must be the same as the ones used in the Slurm publisher in the Cassandra section.

## Data persistence

During the installation, two Docker volumes are created, which are required for data persistence.

```bash
$ docker volume ls
DRIVER VOLUME NAME
local examon_cassandra_volume
local examon_grafana_volume
```

* The `examon_cassandra_volume` is used to store the collected metrics
* The `examon_grafana_volume` is used to store Grafana:
* users account data
* dashboards

To set a custom volume path, you can use the following settings in the `docker-compose.yml` file:

```yaml
volumes:
cassandra_volume:
driver: local
driver_opts:
type: none
device: /path/to/cassandra/volume
o: bind
grafana_volume:
driver: local
driver_opts:
type: none
device: /path/to/grafana/volume
o: bind
```

1 change: 1 addition & 0 deletions VERSION
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
v0.3.1
Loading