This directory contains the configuration for running Text Generation WebUI with TensorRT-LLM in a Docker container as part of the ClassroomCopilot project.
- Make sure you have Docker installed and running.
- If you want to use GPU acceleration, ensure you have NVIDIA drivers and the NVIDIA Container Toolkit installed.
Place your language models in the following directory:
cc-volumes/text-generation/models/
The container supports various model formats including:
- GGUF models (for CPU inference)
- HuggingFace models
- TensorRT-LLM optimized models
The container is configured to start automatically with the rest of the ClassroomCopilot services:
docker-compose up -d text-generation-webui
Or you can start all services:
docker-compose up -d
Once the container is running, you can access the WebUI at:
http://localhost:7861
Or through the Nginx reverse proxy at:
http://textgen.localhost
The API is available at:
http://localhost:5010
Or through the Nginx reverse proxy at:
http://textgen.localhost/api
The container is configured with the following settings:
- Uses CPU-only inference optimized for Apple Silicon
- Exposes both the web interface (port 7861) and API (port 5010)
- Mounts volumes for models, LoRAs, presets, characters, and extensions
This container is specifically configured for Apple Silicon (M1/M2/M3) Macs. It uses CPU-only inference since TensorRT-LLM is not compatible with Apple Silicon. For optimal performance on Apple Silicon:
- Use GGUF models which are optimized for CPU inference
- Smaller models (7B parameters or less) will perform better
- Consider using models with quantization (like Q4_K_M) for faster inference
- Mistral 7B Instruct GGUF (Q4_K_M)
- Llama 2 7B Chat GGUF (Q4_K_M)
- Phi-2 GGUF (Q4_K_M)
You can download these models using the setup script or manually place them in the models directory.
If you encounter issues:
-
Model loading errors: Ensure your models are in the correct format and location.
-
GPU issues: Check that your NVIDIA drivers and CUDA are properly installed and that the NVIDIA Container Toolkit is configured.
-
Container logs: Check the container logs for more detailed error messages:
docker-compose logs text-generation-webui
-
Restart the container: Sometimes simply restarting the container can resolve issues:
docker-compose restart text-generation-webui