Inference Endpoints for Large Language Models

Below you'll find the necessary instructions in order to run the provided code.

There are 6 endpoints included and 4 vision-language SoTA models are utilized.

1. Requirements

CUDA compatible GPU
1. The more VRAM the better but at least 12GB of VRAM are recommended
For LINUX (tested on Ubuntu 20.04)
1. The system was tested on Nvidia proprietary driver 515 and 525
2. Make sure Docker is installed on your system. For instructions you can refer to the official docker guide
3. Make sure you have the NVIDIA Container Toolking installed. More info and instructions can be found in the official installation guide
For Windows
1. Windows 10 (or above) which supports CUDA utilzation with docker (if you wish to run the docker images)
Python 3.10.9 was used for all the provided code

One you have docker up and running you can move to cloning the repository.

2. Cloning the repository

Start by cloning the repository by running the following command: git clone https://github.com/stefbil/inference-endpoints-for-llms.git

3. Building the docker images

In order to build the docker images from scratch you should follow the instructions below:

inference-endopoints-for-llms

You should have the following directory structure:

inference-endpoints-for-llms
├── code
│   ├── demo.py
│   ├── extracting_data.py
│   ├── image_caption.py
│   ├── main.py
│   ├── modeling_frcnn.py
│   ├── pnpv2.py
│   ├── processing_image.py
│   ├── requirements.txt
│   ├── utils.py
│   ├── vilt_vqa.py
│   └── visualizing_image.py  
├── huggingface (may not exist)
├── torch (may not exist)
├── dockerfile
|── LINUX_Start_Contrainer.sh
├── LINUX_Stop_Container.sh
├── WIN_GPU_Start_Container.bat
├── WIN_GPU_Stop_Container.bat
└── README.md

The huggingface and torch directories contain the serving models. If they don't exist, they will be downloaded automatically when you run the docker container.
Build the docker image by running the following command while on the base directory: sudo docker build .

4. Running the docker images

For LINUX
1. You can start the docker container by running the LINUX_Start_Contrainer.sh script.
2. There should be a message in the terminal that will display wether the GPU is available or not (the container won't run if there's no GPU available).
3. Once the container has started, you can move to the https://localhost:5035/docs or https://YourExternalIP:5035/docs address on your browser and start using the api.
4. To stop the container run the LINUX_Stop_Container.sh script. This script will create a Dumps folder and copy all the tests you have run e.g. uploaded photos and questions.
For Windows
1. You can start the container by running the WIN_GPU_Start_Container.bat script.
2. There should be a message in the terminal that will display wether the GPU is available or not (the container won't run if there's no GPU available).
3. Once the container has started, you can move to the https://localhost:5035/docs or https://YourExternalIP:5035/docs address on your browser and start using the api.
4. To stop the container run the WIN_GPU_Stop_Container.bat script. This script will create a Dumps folder and copy all the tests you have run e.g. uploaded photos and questions.

Further instructions on running the docker images can be found in the here.

Disclaimer:

The api serves the models developed in 4 SoTA projects:
Here's the code for the PNP-VQA paper
Here's the code for the LXMERT paper
Here's the ViT_GPT2 model provided by NLP Connect
Here's the BLIP model from the paper

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
code		code
DockerInstructions.md		DockerInstructions.md
LINUX_Start_Container.sh		LINUX_Start_Container.sh
LINUX_Stop_Container.sh		LINUX_Stop_Container.sh
README.md		README.md
WIN_GPU_Start_Container.bat		WIN_GPU_Start_Container.bat
WIN_GPU_Stop_Container.bat		WIN_GPU_Stop_Container.bat
dockerfile		dockerfile
exec		exec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Inference Endpoints for Large Language Models

1. Requirements

2. Cloning the repository

3. Building the docker images

4. Running the docker images

About

Uh oh!

Releases

Packages

Uh oh!

Languages

stefbil/inference-endpoints-for-llms

Folders and files

Latest commit

History

Repository files navigation

Inference Endpoints for Large Language Models

1. Requirements

2. Cloning the repository

3. Building the docker images

4. Running the docker images

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages