This is a template repository for building applications using microservices architecture. It includes a basic frontend, backend, an LLM serving engine (e.g., vLLM), and nginx as a reverse proxy. This template is designed to help you quickly set up and deploy applications that use large language models (LLMs) along with traditional web services.
- Frontend: Responsible for the frotnend part of the application.
 - Backend: Handles the business logic and communicates with the LLM server.
 - LLM Server: A serving engine (such as vLLM) that provides API endpoints for large language model inference.
 - Nginx: Acts as a reverse proxy to route traffic between the frontend and backend.
 
├── frontend/         # Source code for the frontend application
├── backend/          # Source code for the backend application
├── models/           # Directory for storing models (e.g., huggingface checkpoints)
├── nginx.conf        # Nginx configuration file
├── docker-compose.yml # Docker Compose configuration for orchestrating servicesEnsure you have the following installed:
- 
Clone the repository:
git clone https://github.com/alex0dd/llm-app-microservices-template.git cd llm-app-microservices-template - 
Configure the environment:
- Download a huggingface checkpoint of the model (e.g., Meta-Llama-3.1-8B-Instruct)
 - Place your the checkpoint directory in the 
models/directory. - Ensure the 
nginx.conffile is correctly configured (especially the llm_server section). 
 - 
Build and start the services:
With
vLLM:docker-compose up --build
With
ollama:docker compose -f docker-compose-ollama.yaml up --build docker compose -f docker-compose-ollama.yaml up --build --force-recreate --remove-orphans
This will build and start the following services:
nginx: Reverse proxy serverfrontend: The web frontend, accessible viahttp://localhostbackend: The backend API server, accessible viahttp://localhost/apillm_server: LLM serving engine, available for backend use, but not exposed to the public.
 - 
Access the application:
- Frontend: 
http://localhost - Backend API: 
http://localhost/api 
 - Frontend: 
 
- 
The frontend sends requests to the backend.
 - 
The backend exposes the LLM via an API, communicates with the llm_server for model inference.
 - 
nginx handles routing for public requests (i.e., requests to the frontend and backend), but it does not expose the LLM server directly to the outside world.
 
- Frontend: Modify the source code in the 
frontend/folder according to your frontend framework. - Backend: Implement your business logic in the 
backend/folder. Ensure it properly communicates with the LLM server. - LLM Model: Replace the model in 
models/with the one you want to use (e.g., Meta-Llama-3.1-8B-Instruct). - LLM Server: Replace the 
llm_serversection in thedocker-compose.yamlfile with other servers such as ollama. 
This project is licensed under the MIT License.
Feel free to fork this repository and make contributions.