Deplo.ai is an open-source framework that aims to facilitate the deployment and evaluating of Machine Learning (ML) models on heterogeneous hardware : CPUs (Central Processing Units), GPUs (Graphics Processing Units), FPGAs (Field Programmable Gate Arrays), MCUs (Microcontroller Units) and ASICs (Application-Specific Integrated Circuits).
Deplo.ai eliminates the hardware complexity for the user allowing to benchmark quickly and easily, in a few lines of code, multiples configurations of ML models and hardware targets. Based on MLOps/TinyMLOps principles enabling the building of lightweight deployment pipelines, it provides a benchmarking "à la carte".
- Explore deployment solutions under hardware/ML/business constraints to minimize implementation and execution failures
- Breaking down a highly fragmented environment : ML frameworks / heterogeneous hardware architectures
- Customize deployment scenarios by combining different services/components/probes/evaluators
- Facilitate use by integrating into the development flow of an ML engineer/researcher
- For the ML engineer/Data scientist who wants to integrate the evaluation of their model under operational conditions very early on in their workflow, with a view to iterative development.
The full documentation is available in docs.
Deplo.ai can be used directly in notebook experiments such as Jupyter.
Examples are available in notebooks.
On the host side, install the following image :
- Nvidia Triton Inference Server (model serving):
docker pull nvcr.io/nvidia/tritonserver:22.07-py3
- prometheus (metrics colleting) :
docker pull prom/prometheus
- statsd-exporter (statistics aggregation exporter) :
docker pull prom/statsd-exporter
- prometheus-pushgateway (metrics exposition):
docker pull prom/pushgateway
- grafana (metrics visualization):
docker pull grafana/grafana
For docker socket forwarding, if you encountered some problems of priviledge :
sudo chmod 666 /var/run/docker.sock
To orchestrate containers, install k3s, using docker as container runtime :
curl -sfL https://get.k3s.io | sh - --docker
The k3s will be automatically start ; to check it status : :
sudo service k3s stop
sudo k3s server --docker
In order to be able to run a remote container on a target connected via ssh, openssh-server must be started on your host :
sudo service ssh start
On the target side, to deploy directly the inference server (≈12Go), you should change the default data directory, by editing the file /usr/lib/systemd/system/docker.service (copy your previous data to the new folder).
From mltestbench folder :
- create your virtual environment
python3.9 -m venv env
- activate environment
source env/bin/activate
Then, follow from step 3 of the section below using docker.
To finish, initialize airflow database and add prometheus connection :
airflow db init
airflow connections add http_prometheus --conn-type http --conn-host localhost --conn-port 9090
- build the docker image
docker build -t mltb --build-arg ABS_PATH=/mltestbench_parent_path -f docker/Dockerfile .
- run the docker image with following command :
docker run -it --rm --privileged --net=host -v /var/run/docker.sock:/var/run/docker.sock -v mltestbench_path:/mltestbench --workdir /mltestbench mltbench bash
- build the environment script :
./setup/setup.sh mltestbench_path
- source the environment :
source ./setup/env.sh
- fetch the models
cd models && ./fetch_models.sh
All services could be started manually :
- start all the infrastructure services required for experimention/monitoring/visualization :
start_infra_services
- start all the mltestbench services :
start_mltb_services
Alternatively, k3s could be used. For that, just run the pods and services necessary to your usage.
E.g. :
cd k3s
sudo kubectl apply -f deployment_pod.yaml
sudo kubectl apply -f proxy_pod.yaml
sudo kubectl apply -f configurator_pod_k3s.yaml
The pod configuration file deployment_pod_k3s.yaml and associated service file deployment_service_k3s.yaml are examples that show how to deploy a specific pod connected to host network, and the allow services inter-communication.
Here, some remind about k3s commands :
sudo kubectl get pods
sudo kubectl logs my_pod
sudo kubectl delete my_pod
Note : infrastructures services has not been yet tested with k3s.
Two implementations are proposed to user :
- implement its own workflow (i.e build its chain of services)
- run a predefined workflow (i.e load a chain of services)
Two interaction modes are proposed to user :
- python API
- jupyter notebook with Grafana visualization
Two scenarios are proposed to user :
- local (on host)
- remote (on IRT SystemX testbench) (soon available -> network hardware migration in progress)
In both scenarios, the baseline is the following :
- deploy inference server
- deploy an onnx densenet model into a triton container
- run an inference
The collecting of metrics is performed only in the second interaction mode because of the dependency between the triton container lifetime and DAG execution (with python API, it can be handled easily timers):
- collect metrics :
- hardware/ml metrics on remote, relative to inference server, and experiment manager (airflow)
Trigger the experiment exp_1 dag :
airflow dags test -S /user/exp_example.py exp_1 now
In your browser, at jupyter notebook address localhost:8888 (or the url displayed in stdout after the starting of infrastruture services), navigate into mltestbench/notebooks and run the exp_example.ipynb notebook.
Then, you can explore the metrics collected with Grafana at localhost:3000 (login/pwd : admin/admin -> automatic login will be released soon).
Stop the mltestbench services :
stop_mltb_services
Stop the infrastructure services :
stop_infra_services