Skip to content

IRT-SystemX/deploai

Repository files navigation

Deplo.ai

Deplo.ai is an open-source framework that aims to facilitate the deployment and evaluating of Machine Learning (ML) models on heterogeneous hardware : CPUs (Central Processing Units), GPUs (Graphics Processing Units), FPGAs (Field Programmable Gate Arrays), MCUs (Microcontroller Units) and ASICs (Application-Specific Integrated Circuits).

Deplo.ai eliminates the hardware complexity for the user allowing to benchmark quickly and easily, in a few lines of code, multiples configurations of ML models and hardware targets. Based on MLOps/TinyMLOps principles enabling the building of lightweight deployment pipelines, it provides a benchmarking "à la carte".

For what ?

  • Explore deployment solutions under hardware/ML/business constraints to minimize implementation and execution failures
  • Breaking down a highly fragmented environment : ML frameworks / heterogeneous hardware architectures
  • Customize deployment scenarios by combining different services/components/probes/evaluators
  • Facilitate use by integrating into the development flow of an ML engineer/researcher

image

For who ?

  • For the ML engineer/Data scientist who wants to integrate the evaluation of their model under operational conditions very early on in their workflow, with a view to iterative development.

image

Documentation

The full documentation is available in docs.

Notebook

Deplo.ai can be used directly in notebook experiments such as Jupyter.
Examples are available in notebooks.

Quick start

Setup

Requirements on host side

On the host side, install the following image :

docker pull nvcr.io/nvidia/tritonserver:22.07-py3
docker pull prom/prometheus
docker pull prom/statsd-exporter
docker pull prom/pushgateway
docker pull grafana/grafana

For docker socket forwarding, if you encountered some problems of priviledge :

sudo chmod 666 /var/run/docker.sock

To orchestrate containers, install k3s, using docker as container runtime :

curl -sfL https://get.k3s.io | sh - --docker

The k3s will be automatically start ; to check it status : :

sudo service k3s stop
sudo k3s server --docker

In order to be able to run a remote container on a target connected via ssh, openssh-server must be started on your host :

sudo service ssh start
Requirements on target side

On the target side, to deploy directly the inference server (≈12Go), you should change the default data directory, by editing the file /usr/lib/systemd/system/docker.service (copy your previous data to the new folder).

Installation with virtual environment

From mltestbench folder :

  1. create your virtual environment
python3.9 -m venv env
  1. activate environment
source env/bin/activate

Then, follow from step 3 of the section below using docker.
To finish, initialize airflow database and add prometheus connection :

airflow db init
airflow connections add http_prometheus --conn-type http --conn-host localhost --conn-port 9090
Installation with docker
  1. build the docker image
docker build -t mltb --build-arg ABS_PATH=/mltestbench_parent_path -f docker/Dockerfile .
  1. run the docker image with following command :
docker run -it --rm --privileged --net=host -v /var/run/docker.sock:/var/run/docker.sock  -v mltestbench_path:/mltestbench --workdir /mltestbench mltbench bash
  1. build the environment script :
./setup/setup.sh mltestbench_path
  1. source the environment :
source ./setup/env.sh
  1. fetch the models
cd models && ./fetch_models.sh
Start the services

All services could be started manually :

  1. start all the infrastructure services required for experimention/monitoring/visualization :
start_infra_services
  1. start all the mltestbench services :
start_mltb_services

Alternatively, k3s could be used. For that, just run the pods and services necessary to your usage.
E.g. :

cd k3s
sudo kubectl apply -f deployment_pod.yaml
sudo kubectl apply -f proxy_pod.yaml
sudo kubectl apply -f configurator_pod_k3s.yaml

The pod configuration file deployment_pod_k3s.yaml and associated service file deployment_service_k3s.yaml are examples that show how to deploy a specific pod connected to host network, and the allow services inter-communication.

Here, some remind about k3s commands :

sudo kubectl get pods
sudo kubectl logs my_pod
sudo kubectl delete my_pod

Note : infrastructures services has not been yet tested with k3s.

Run an experiment

Two implementations are proposed to user :

  • implement its own workflow (i.e build its chain of services)
  • run a predefined workflow (i.e load a chain of services)

Two interaction modes are proposed to user :

  • python API
  • jupyter notebook with Grafana visualization

Two scenarios are proposed to user :

  • local (on host)
  • remote (on IRT SystemX testbench) (soon available -> network hardware migration in progress)

In both scenarios, the baseline is the following :

  • deploy inference server
  • deploy an onnx densenet model into a triton container
  • run an inference

The collecting of metrics is performed only in the second interaction mode because of the dependency between the triton container lifetime and DAG execution (with python API, it can be handled easily timers):

  • collect metrics :
    • hardware/ml metrics on remote, relative to inference server, and experiment manager (airflow)
Run its own workflow with python API

Trigger the experiment exp_1 dag :

airflow dags test -S /user/exp_example.py exp_1 now
Run a predifined workflow with jupyter notebook

In your browser, at jupyter notebook address localhost:8888 (or the url displayed in stdout after the starting of infrastruture services), navigate into mltestbench/notebooks and run the exp_example.ipynb notebook.
Then, you can explore the metrics collected with Grafana at localhost:3000 (login/pwd : admin/admin -> automatic login will be released soon).

Stop all the services

Stop the mltestbench services :

stop_mltb_services

Stop the infrastructure services :

stop_infra_services

Deplo.ai is developed by

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •