|
| 1 | +<!-- |
| 2 | +Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved. |
| 3 | +
|
| 4 | +Licensed under the Apache License, Version 2.0 (the "License"); |
| 5 | +you may not use this file except in compliance with the License. |
| 6 | +You may obtain a copy of the License at |
| 7 | +
|
| 8 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 9 | +
|
| 10 | +Unless required by applicable law or agreed to in writing, |
| 11 | +software distributed under the License is distributed on an "AS IS" BASIS, |
| 12 | +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 13 | +See the License for the specific language governing permissions and |
| 14 | +limitations under the License. |
| 15 | +--> |
| 16 | + |
| 17 | +# How to Use SparseML With Docker |
| 18 | +SparseML provides libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models. |
| 19 | + |
| 20 | +Apart from installing SparseML via `pip` you can set it up quickly using Docker. |
| 21 | + |
| 22 | +## Prerequisites |
| 23 | +Before you begin, make sure you have Docker installed on your machine. You can download and install it from the [official Docker website](https://www.docker.com/products/docker-desktop). |
| 24 | + |
| 25 | +## Pull The Official SparseML Image |
| 26 | + |
| 27 | +The following lines of code will: |
| 28 | +- Pull the official SparseML image from GitHub Container Registry |
| 29 | +- Tag the image as `sparseml_docker` |
| 30 | +- Start the `sparseml_docker` in interactive mode |
| 31 | + |
| 32 | +```bash |
| 33 | +docker pull ghcr.io/neuralmagic/sparseml:1.4.4-cu111 |
| 34 | +docker tag ghcr.io/neuralmagic/sparseml:1.4.4-cu111 sparseml_docker |
| 35 | +docker container run -it sparseml_docker |
| 36 | +``` |
| 37 | +## NLP NER Example |
| 38 | +You can train various CV or NLP models inside the SparseML container. To use GPUs when training, add the `gpus` flag when starting the container. |
| 39 | + |
| 40 | +The command below starts the container with all the available GPUs: |
| 41 | +```bash |
| 42 | +docker container run -it --gpus all sparseml_docker |
| 43 | +``` |
| 44 | +You can also start the container with a higher memory allocation declared with the `shm-size` argument to prevent memory problems: |
| 45 | +```bash |
| 46 | +docker container run --gpus all --shm-size=256m -it sparseml_docker |
| 47 | +``` |
| 48 | + |
| 49 | +Here's an example showing how to train a NER model inside the SparseML container: |
| 50 | + |
| 51 | +```bash |
| 52 | +sparseml.transformers.train.token_classification \ |
| 53 | + --model_name_or_path zoo:nlp/masked_language_modeling/obert-base/pytorch/huggingface/wikipedia_bookcorpus/pruned90-none \ |
| 54 | + --recipe zoo:nlp/token_classification/obert-base/pytorch/huggingface/conll2003/pruned90_quant-none \ |
| 55 | + --distill_teacher zoo:nlp/token_classification/obert-base/pytorch/huggingface/conll2003/base-none \ |
| 56 | + --dataset_name conll2003 \ |
| 57 | + --output_dir sparse_bert-token_classification_conll2003 \ |
| 58 | + --per_device_train_batch_size 32 --per_device_eval_batch_size 32 --preprocessing_num_workers 6 \ |
| 59 | + --do_train --do_eval --evaluation_strategy epoch --fp16 --seed 29204 \ |
| 60 | + --save_strategy epoch --save_total_limit 1 |
| 61 | + |
| 62 | + > ***** eval metrics ***** |
| 63 | + > epoch = 13.0 |
| 64 | + > eval_accuracy = 0.98 |
| 65 | + > eval_f1 = 0.8953 |
| 66 | + > eval_loss = 0.0878 |
| 67 | + > eval_precision = 0.8887 |
| 68 | + > eval_recall = 0.9021 |
| 69 | + > eval_runtime = 0:00:12.56 |
| 70 | + > eval_samples = 3251 |
| 71 | + > eval_samples_per_second = 258.667 |
| 72 | + > eval_steps_per_second = 2.069 |
| 73 | + |
| 74 | +``` |
| 75 | + |
| 76 | +To confirm that the GPUs are being utilized, run: |
| 77 | + |
| 78 | +```bash |
| 79 | +$ nvidia-smi |
| 80 | ++-----------------------------------------------------------------------------+ |
| 81 | +| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 | |
| 82 | +|-------------------------------+----------------------+----------------------+ |
| 83 | +| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | |
| 84 | +| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |
| 85 | +| | | MIG M. | |
| 86 | +|===============================+======================+======================| |
| 87 | +| 0 NVIDIA RTX A4000 On | 00000000:05:00.0 Off | Off | |
| 88 | +| 44% 65C P2 106W / 140W | 9614MiB / 16376MiB | 97% Default | |
| 89 | +| | | N/A | |
| 90 | ++-------------------------------+----------------------+----------------------+ |
| 91 | +| 1 NVIDIA RTX A4000 On | 00000000:06:00.0 Off | Off | |
| 92 | +| 41% 56C P2 77W / 140W | 6322MiB / 16376MiB | 42% Default | |
| 93 | +| | | N/A | |
| 94 | ++-------------------------------+----------------------+----------------------+ |
| 95 | +| 2 NVIDIA RTX A4000 On | 00000000:07:00.0 Off | Off | |
| 96 | +| 41% 55C P2 72W / 140W | 6322MiB / 16376MiB | 37% Default | |
| 97 | +| | | N/A | |
| 98 | ++-------------------------------+----------------------+----------------------+ |
| 99 | +| 3 NVIDIA RTX A4000 On | 00000000:08:00.0 Off | Off | |
| 100 | +| 41% 54C P2 74W / 140W | 6322MiB / 16376MiB | 51% Default | |
| 101 | +| | | N/A | |
| 102 | ++-------------------------------+----------------------+----------------------+ |
| 103 | +``` |
| 104 | +Lower the `batch_size` in case you get any CUDA error messages. |
| 105 | + |
| 106 | +## Image Classification |
| 107 | + |
| 108 | +The `sparseml.image_classification.train` command is used to train image classification models with SparseML. |
| 109 | + |
| 110 | +Start the SparseML container with all the available GPUs: |
| 111 | + |
| 112 | +```bash |
| 113 | +docker container run --gpus all -it sparseml_docker |
| 114 | +``` |
| 115 | + |
| 116 | +Download a subset of the ImageNet dataset and use it to train a ResNet-50 model. |
| 117 | +```bash |
| 118 | +curl https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-320.tgz --output imagenette2-320.tgz |
| 119 | +tar -xvf imagenette2-320.tgz |
| 120 | + |
| 121 | +sparseml.image_classification.train \ |
| 122 | + --recipe zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_quant-none?recipe_type=transfer-classification \ |
| 123 | + --checkpoint-path zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_quant-none?recipe_type=transfer-classification \ |
| 124 | + --arch-key resnet50 \ |
| 125 | + --dataset-path ./imagenette2-320 \ |
| 126 | + --batch-size 32 |
| 127 | + ``` |
| 128 | + The [`recipe`](https://sparsezoo.neuralmagic.com/models/cv%2Fclassification%2Fresnet_v1-50%2Fpytorch%2Fsparseml%2Fimagenet%2Fpruned95_quant-none) |
| 129 | + instructs SparseML to maintain sparsity during training and to quantize the model over the final epochs. |
| 130 | + |
| 131 | +## Object Detection |
| 132 | +The `sparseml.yolov5.train` command is used to train YOLOv5 models with SparseML. |
| 133 | + |
| 134 | +Start the SparseML container with all the available GPUs: |
| 135 | + |
| 136 | +```bash |
| 137 | +docker container run --gpus all -it sparseml_docker |
| 138 | +``` |
| 139 | +The CLI command below trains a YOLOv5 model on the VOC dataset with a [`recipe`](https://sparsezoo.neuralmagic.com/models/cv%2Fdetection%2Fyolov5-s%2Fpytorch%2Fultralytics%2Fcoco%2Fpruned75_quant-none) |
| 140 | +that instructs SparseML to maintain sparsity during training and to quantize the model over the final epochs. |
| 141 | +```bash |
| 142 | +sparseml.yolov5.train \ |
| 143 | + --data VOC.yaml \ |
| 144 | + --cfg models_v5.0/yolov5s.yaml \ |
| 145 | + --weights zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94?recipe_type=transfer \ |
| 146 | + --hyp data/hyps/hyp.finetune.yaml \ |
| 147 | + --recipe zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned-aggressive_96 |
| 148 | + |
| 149 | +``` |
| 150 | +## How to Build Your Own SparseML Image |
| 151 | +To build your own SparseML image [follow these instructions](https://github.com/neuralmagic/sparseml/blob/main/docker/README.md) |
0 commit comments