A FastAPI server that provides local text and multi-modal embedding using LlamaIndex and Hugging Face Embedding models. Supports text, document, and image embedding via API endpoints. Easily deployable locally, with Docker, or on Google Colab.
huggingface-local-embedding/
├── README.md # Project documentation
├── requirements.txt # Python dependencies
├── Dockerfile # Docker configuration
├── docker-compose.yml # Docker Compose configuration
├── huggingface_embedding_server.py # Main FastAPI application
├── huggingface_embedding_server.ipynb # Jupyter notebook version
├── LICENSE # MIT License
└── .gitignore # Git ignore rules
- Local text and multi-modal (image, text) embedding
- FastAPI server with REST endpoints
- Uses LlamaIndex and Hugging Face models for embedding
- Docker and Colab ready
- Python 3.13.3 compatible
- Python 3.13.3 or higher
- Docker (optional, for containerized deployment)
- ngrok account (optional, for public URL exposure)
- Use Colab secrets:
from google.colab import userdata userdata.set_secret('NGROK_AUTH_TOKEN')
- The notebook will read the token and set it for ngrok automatically.
-
Create a
.env
file in the project root using a copy of the.env.example
file, and set theNGROK_AUTH_TOKEN
to your ngrok token.cp .env.example .env
NGROK_AUTH_TOKEN=your-ngrok-token-here
-
The Docker Compose setup will load this automatically. For local runs, you can load it in your shell:
export $(cat .env | xargs)
-
Or set the token in your Python code before starting ngrok:
import os from pyngrok import ngrok ngrok.set_auth_token(os.getenv('NGROK_AUTH_TOKEN'))
- Create a virtual environment and install dependencies:
python3.13 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn huggingface_embedding_server:app --host 0.0.0.0 --port 8000
Using uv+pip
-
Install uv:
curl -LsSf https://astral.sh/uv/install.sh | sh
-
Create a virtual environment and install dependencies:
uv venv --python 3.13
source .venv/bin/activate
uv pip install -r requirements.txt
uvicorn huggingface_embedding_server:app --host 0.0.0.0 --port 8000
Using only uv
-
Install uv:
curl -LsSf https://astral.sh/uv/install.sh | sh
-
Create a virtual environment and install dependencies:
uv sync
uvicorn huggingface_embedding_server:app --host 0.0.0.0 --port 8000
docker build -t huggingface-embedding-server .
docker run --env-file .env -p 8000:8000 huggingface-embedding-server
docker-compose up --build
- Open
huggingface_embedding_server.ipynb
in Colab. - Run the first cell to install dependencies:
!pip install fastapi uvicorn pyngrok nest_asyncio llama-index llama-index-embeddings-huggingface
- Set ngrok token in Colab secrets:
- Go to Colab menu:
Runtime
>Change runtime type
>Hardware accelerator
(optional) - In Colab, run:
from google.colab import userdata userdata.set_secret('NGROK_AUTH_TOKEN')
- Go to Colab menu:
- Run all cells to start the server and expose via ngrok.
GET /
— Health checkPOST /embed_text
— Embed a single textPOST /embed_docs
— Embed a list of textsPOST /embed_image
— Embed an imagePOST /embed_batch
— Upload a text file for batch embeddingPOST /embed_multimodal
— Embed text and image together
- For Hugging Face model access, you may need to set
HF_TOKEN
as a Colab secret or environment variable if using private models. - For best performance, use a machine with sufficient RAM and CPU/GPU for model inference.
- This project is tested with Python 3.13.3 but should work with Python 3.11+.
- I haved used huggingface sentence-transformers for embedding. You can use any other embedding model from huggingface, see https://huggingface.co/models?library=sentence-transformers for more details.
MIT License. See LICENSE.
To contribute, please fork the repository and submit a pull request. If you have any questions, please contact me at sany2k8@gmail.com or create an issue with the tag enhancement
or bug
.
- Add a way to upload a file from UI and embed it
- Add a way to embed a pdf, csv, json, docx, etc. file.
- Add a way to embed a video file.
- Add a way to embed a audio file.