Below you'll find the necessary instructions in order to run the provided code.
There are 6 endpoints included and 4 vision-language SoTA models are utilized.
- CUDA compatible GPU
- The more VRAM the better but at least 12GB of VRAM are recommended
- For LINUX (tested on Ubuntu 20.04)
- The system was tested on Nvidia proprietary driver 515 and 525
- Make sure Docker is installed on your system. For instructions you can refer to the official docker guide
- Make sure you have the NVIDIA Container Toolking installed. More info and instructions can be found in the official installation guide
- For Windows
- Windows 10 (or above) which supports CUDA utilzation with docker (if you wish to run the docker images)
- Python 3.10.9 was used for all the provided code
One you have docker up and running you can move to cloning the repository.
- Start by cloning the repository by running the following command:
git clone https://github.com/stefbil/inference-endpoints-for-llms.git
In order to build the docker images from scratch you should follow the instructions below:
- inference-endopoints-for-llms
- You should have the following directory structure:
inference-endpoints-for-llms ├── code │ ├── demo.py │ ├── extracting_data.py │ ├── image_caption.py │ ├── main.py │ ├── modeling_frcnn.py │ ├── pnpv2.py │ ├── processing_image.py │ ├── requirements.txt │ ├── utils.py │ ├── vilt_vqa.py │ └── visualizing_image.py ├── huggingface (may not exist) ├── torch (may not exist) ├── dockerfile |── LINUX_Start_Contrainer.sh ├── LINUX_Stop_Container.sh ├── WIN_GPU_Start_Container.bat ├── WIN_GPU_Stop_Container.bat └── README.md
- The
huggingface
andtorch
directories contain the serving models. If they don't exist, they will be downloaded automatically when you run the docker container. - Build the docker image by running the following command while on the base directory:
sudo docker build .
- You should have the following directory structure:
- For LINUX
- You can start the docker container by running the
LINUX_Start_Contrainer.sh
script. - There should be a message in the terminal that will display wether the GPU is available or not (the container won't run if there's no GPU available).
- Once the container has started, you can move to the
https://localhost:5035/docs
orhttps://YourExternalIP:5035/docs
address on your browser and start using the api. - To stop the container run the
LINUX_Stop_Container.sh
script. This script will create aDumps
folder and copy all the tests you have run e.g. uploaded photos and questions.
- You can start the docker container by running the
- For Windows
- You can start the container by running the
WIN_GPU_Start_Container.bat
script. - There should be a message in the terminal that will display wether the GPU is available or not (the container won't run if there's no GPU available).
- Once the container has started, you can move to the
https://localhost:5035/docs
orhttps://YourExternalIP:5035/docs
address on your browser and start using the api. - To stop the container run the
WIN_GPU_Stop_Container.bat
script. This script will create aDumps
folder and copy all the tests you have run e.g. uploaded photos and questions.
- You can start the container by running the
Further instructions on running the docker images can be found in the here.
Disclaimer:
- The api serves the models developed in 4 SoTA projects:
- Here's the code for the PNP-VQA paper
- Here's the code for the LXMERT paper
- Here's the ViT_GPT2 model provided by NLP Connect
- Here's the BLIP model from the paper