Skip to content

A Fast API server that provides local text and multi-modal embedding using LlamaIndex Hugging Face Embedding

License

Notifications You must be signed in to change notification settings

AlwaysSany/huggingface-local-embedding

Repository files navigation

Hugging Face Local Embedding

A FastAPI server that provides local text and multi-modal embedding using LlamaIndex and Hugging Face Embedding models. Supports text, document, and image embedding via API endpoints. Easily deployable locally, with Docker, or on Google Colab.


Project Structure

huggingface-local-embedding/
├── README.md                           # Project documentation
├── requirements.txt                    # Python dependencies
├── Dockerfile                          # Docker configuration
├── docker-compose.yml                  # Docker Compose configuration
├── huggingface_embedding_server.py     # Main FastAPI application
├── huggingface_embedding_server.ipynb  # Jupyter notebook version
├── LICENSE                             # MIT License
└── .gitignore                          # Git ignore rules

Features

  • Local text and multi-modal (image, text) embedding
  • FastAPI server with REST endpoints
  • Uses LlamaIndex and Hugging Face models for embedding
  • Docker and Colab ready
  • Python 3.13.3 compatible

Demo

FastAPI Server Demo

Prerequisites

  • Python 3.13.3 or higher
  • Docker (optional, for containerized deployment)
  • ngrok account (optional, for public URL exposure)

Setup Guide

Setting ngrok Secrets

In Google Colab

  • Use Colab secrets:
    from google.colab import userdata
    userdata.set_secret('NGROK_AUTH_TOKEN')
  • The notebook will read the token and set it for ngrok automatically.

Colab Secrets

Locally (.env file)

  • Create a .env file in the project root using a copy of the .env.example file, and set the NGROK_AUTH_TOKEN to your ngrok token.

    cp .env.example .env
    NGROK_AUTH_TOKEN=your-ngrok-token-here
  • The Docker Compose setup will load this automatically. For local runs, you can load it in your shell:

    export $(cat .env | xargs)
  • Or set the token in your Python code before starting ngrok:

    import os
    from pyngrok import ngrok
    ngrok.set_auth_token(os.getenv('NGROK_AUTH_TOKEN'))

1. Standalone (Local) Setup

Using pip

  • Create a virtual environment and install dependencies:
python3.13 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn huggingface_embedding_server:app --host 0.0.0.0 --port 8000

Using uv+pip

  • Install uv:

    curl -LsSf https://astral.sh/uv/install.sh | sh
  • Create a virtual environment and install dependencies:

uv venv --python 3.13
source .venv/bin/activate
uv pip install -r requirements.txt
uvicorn huggingface_embedding_server:app --host 0.0.0.0 --port 8000

Using only uv

  • Install uv:

    curl -LsSf https://astral.sh/uv/install.sh | sh
  • Create a virtual environment and install dependencies:

uv sync
uvicorn huggingface_embedding_server:app --host 0.0.0.0 --port 8000

2. Dockerized Setup

Build and Run with Docker

docker build -t huggingface-embedding-server .
docker run --env-file .env -p 8000:8000 huggingface-embedding-server

Using Docker Compose

docker-compose up --build

3. Google Colab Setup

  • Open huggingface_embedding_server.ipynb in Colab.
  • Run the first cell to install dependencies:
    !pip install fastapi uvicorn pyngrok nest_asyncio llama-index llama-index-embeddings-huggingface
  • Set ngrok token in Colab secrets:
    • Go to Colab menu: Runtime > Change runtime type > Hardware accelerator (optional)
    • In Colab, run:
      from google.colab import userdata
      userdata.set_secret('NGROK_AUTH_TOKEN')
  • Run all cells to start the server and expose via ngrok.

API Endpoints

  • GET / — Health check
  • POST /embed_text — Embed a single text
  • POST /embed_docs — Embed a list of texts
  • POST /embed_image — Embed an image
  • POST /embed_batch — Upload a text file for batch embedding
  • POST /embed_multimodal — Embed text and image together

Notes

  • For Hugging Face model access, you may need to set HF_TOKEN as a Colab secret or environment variable if using private models.
  • For best performance, use a machine with sufficient RAM and CPU/GPU for model inference.
  • This project is tested with Python 3.13.3 but should work with Python 3.11+.
  • I haved used huggingface sentence-transformers for embedding. You can use any other embedding model from huggingface, see https://huggingface.co/models?library=sentence-transformers for more details.

License

MIT License. See LICENSE.


Contributing

To contribute, please fork the repository and submit a pull request. If you have any questions, please contact me at sany2k8@gmail.com or create an issue with the tag enhancement or bug.

Credits


TODO

  • Add a way to upload a file from UI and embed it
  • Add a way to embed a pdf, csv, json, docx, etc. file.
  • Add a way to embed a video file.
  • Add a way to embed a audio file.

Contact

About

A Fast API server that provides local text and multi-modal embedding using LlamaIndex Hugging Face Embedding

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published