Hermes Ingestor

A microservice for processing and embedding documents for knowledge bases.

Overview

Hermes Ingestor is a part of a larger LLM-powered self-service knowledge base platform. The ingestor handles:

Document processing (PDF, Markdown, HTML, Text, DOCX)
Text extraction
Document chunking
Vector embedding generation
Storage in Qdrant vector database

Features

Support for multiple document formats (PDF, Markdown, HTML, Text, DOCX)
Document chunking with metadata preservation
Text embedding using sentence transformers
Integration with Qdrant vector database
RESTful API for document ingestion and management
Kubernetes-ready containerization

Installation

Requirements

Python 3.8 or higher
Qdrant (local instance or cloud)
Dependencies listed in requirements.txt

Setup

# Clone the repository
git clone https://github.com/wrkode/hermes-ingestor.git
cd hermes-ingestor

# Install dependencies
pip install -r requirements.txt

# Install the package
pip install -e .

Configuration

The service is configured using environment variables:

# Core settings
DEBUG=False
UPLOAD_FOLDER=uploads
MAX_FILE_SIZE_MB=50

# Qdrant settings
QDRANT_HOST=localhost
QDRANT_PORT=6333
QDRANT_COLLECTION=documents
QDRANT_PREFER_GRPC=True
QDRANT_API_KEY=  # For Qdrant Cloud

# Embedding settings
EMBEDDING_MODEL=all-MiniLM-L6-v2
EMBEDDING_BATCH_SIZE=32
EMBEDDING_DIMENSIONS=384

# Chunking settings
CHUNK_SIZE=1000
CHUNK_OVERLAP=200

Usage

Running the Service

# Run the service
python -m src.main

# Run with custom host and port
python -m src.main --host 0.0.0.0 --port 8080

# Run in debug mode
python -m src.main --debug

API Endpoints

Document Ingestion

POST /api/ingest/file - Upload and process a single document
POST /api/ingest/files - Upload and process multiple documents

Document Management

DELETE /api/document/{filename} - Delete a document by filename
POST /api/document/delete - Delete documents by metadata filter

Service Information

GET /api/health - Health check
GET /api/status - Service status

Docker

Build and run the service using Docker:

# Build the Docker image
docker build -t hermes-ingestor .

# Run the container
docker run -p 8000:8000 \
  -e QDRANT_HOST=host.docker.internal \
  -e QDRANT_PORT=6333 \
  hermes-ingestor

# Alternatively
docker compose up --build

Kubernetes

Example Kubernetes deployment configurations are available in the k8s/ directory.

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
docs		docs
k8s		k8s
src		src
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
env.example		env.example
pytest.ini		pytest.ini
requirements.txt		requirements.txt
run.py		run.py
setup.py		setup.py
test_document.txt		test_document.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hermes Ingestor

Overview

Features

Installation

Requirements

Setup

Configuration

Usage

Running the Service

API Endpoints

Document Ingestion

Document Management

Service Information

Docker

Kubernetes

License

About

Uh oh!

Releases

Packages

Languages

License

wrkode/hermes-ingestor

Folders and files

Latest commit

History

Repository files navigation

Hermes Ingestor

Overview

Features

Installation

Requirements

Setup

Configuration

Usage

Running the Service

API Endpoints

Document Ingestion

Document Management

Service Information

Docker

Kubernetes

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages