High-performance Retrieval-Augmented Generation (RAG) on Redis or Qdrant
🚀 FastAPI • Redis / Qdrant • Async • Embedding-agnostic
- Features
- Tech Stack
- Requirements
- Installation
- Configuration & Connection Options
- Usage
- Architecture
- License
- 📈 High Performance: Vector search powered by Redis HNSW or Qdrant.
- 🛠️ Simple API: Endpoints for index creation, insertion, and querying.
- 🔌 Embedding-agnostic: Works with any embedding model (OpenAI, Llama 3, HuggingFace, etc.).
- 💻 Interactive Setup Wizard:
aquiles-rag configs
walks you through full configuration for Redis or Qdrant. - ⚡ Sync & Async clients:
AquilesRAG
(requests) andAsyncAquilesRAG
(httpx) withembedding_model
metadata support. - 🧩 Extensible: Designed to integrate into ML pipelines, microservices, or serverless deployments.
- Python 3.9+
- FastAPI
- Redis or Qdrant as vector store
- NumPy
- Pydantic
- Jinja2
- Click (CLI)
- Requests (sync client)
- HTTPX (async client)
- Platformdirs (config management)
- Redis (standalone or cluster) — or Qdrant (HTTP / gRPC).
- Python 3.9+
- pip
Optional: run Redis locally with Docker:
docker run -d --name redis-stack -p 6379:6379 redis/redis-stack-server:latest
pip install aquiles-rag
git clone https://github.com/Aquiles-ai/Aquiles-RAG.git
cd Aquiles-RAG
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# optional development install
pip install -e .
Configuration is persisted at:
~/.local/share/aquiles/aquiles_config.json
The previous manual per-flag config flow was replaced by an interactive wizard. Run:
aquiles-rag configs
The wizard prompts for everything required for either Redis or Qdrant (host, ports, TLS/gRPC options, API keys, admin user). At the end it writes aquiles_config.json
to the standard location.
If you prefer automation, generate the same JSON schema the wizard writes and place it at ~/.local/share/aquiles/aquiles_config.json
before starting the server (or use the deploy
pattern described below).
Aquiles-RAG supports multiple Redis modes:
- Local Cluster
RedisCluster(host=host, port=port, decode_responses=True)
- Standalone Local
redis.Redis(host=host, port=port, decode_responses=True)
- Remote with TLS/SSL
redis.Redis(host=host, port=port, username=username or None,
password=password or None, ssl=True, decode_responses=True,
ssl_certfile=ssl_certfile, ssl_keyfile=ssl_keyfile, ssl_ca_certs=ssl_ca_certs)
- Remote without TLS/SSL
redis.Redis(host=host, port=port, username=username or None, password=password or None, decode_responses=True)
- Interactive Setup Wizard (recommended):
aquiles-rag configs
- Serve the API:
aquiles-rag serve --host "0.0.0.0" --port 5500
- Deploy with bootstrap script (pattern:
deploy_*.py
withrun()
that callsgen_configs_file()
):
# Redis example
aquiles-rag deploy --host "0.0.0.0" --port 5500 --workers 4 deploy_redis.py
# Qdrant example
aquiles-rag deploy --host "0.0.0.0" --port 5500 --workers 4 deploy_qdrant.py
The
deploy
command imports the given Python file, executes itsrun()
to generate the config (writesaquiles_config.json
), then starts the FastAPI server.
- Create Index
curl -X POST http://localhost:5500/create/index \
-H "X-API-Key: YOUR_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"indexname": "documents",
"embeddings_dim": 768,
"dtype": "FLOAT32",
"delete_the_index_if_it_exists": false
}'
- Insert Chunk (ingest)
curl -X POST http://localhost:5500/rag/create \
-H "X-API-Key: YOUR_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"index": "documents",
"name_chunk": "doc1_part1",
"dtype": "FLOAT32",
"chunk_size": 1024,
"raw_text": "Text of the chunk...",
"embeddings": [0.12, 0.34, 0.56, ...]
}'
- Query Top-K
curl -X POST http://localhost:5500/rag/query-rag \
-H "X-API-Key: YOUR_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"index": "documents",
"embeddings": [0.78, 0.90, ...],
"dtype": "FLOAT32",
"top_k": 5,
"cosine_distance_threshold": 0.6
}'
from aquiles.client import AquilesRAG
client = AquilesRAG(host="http://127.0.0.1:5500", api_key="YOUR_API_KEY")
# Create an index (returns server text)
resp_text = client.create_index("documents", embeddings_dim=768, dtype="FLOAT32")
# Insert chunks using your embedding function
def get_embedding(text):
return embedding_model.encode(text)
responses = client.send_rag(
embedding_func=get_embedding,
index="documents",
name_chunk="doc1",
raw_text=full_text,
embedding_model="text-embedding-v1" # optional metadata sent with each chunk
)
# Query the index (returns parsed JSON)
results = client.query("documents", query_embedding, top_k=5)
print(results)
import asyncio
from aquiles.client import AsyncAquilesRAG
client = AsyncAquilesRAG(host="http://127.0.0.1:5500", api_key="YOUR_API_KEY")
async def main():
await client.create_index("documents_async")
responses = await client.send_rag(
embedding_func=async_embedding_func, # supports sync or async callables
index="documents_async",
name_chunk="doc_async",
raw_text=full_text
)
results = await client.query("documents_async", query_embedding)
print(results)
asyncio.run(main())
Notes
- Both clients accept an optional
embedding_model
parameter forwarded as metadata — helpful when storing/querying embeddings produced by different models. send_rag
chunks text usingchunk_text_by_words()
(default ~600 words / ≈1024 tokens) and uploads each chunk (concurrently in the async client).
Open the web UI (protected) at:
http://localhost:5500/ui
Use it to:
- Run the Setup Wizard link (if available) or inspect live configs
- Test
/create/index
,/rag/create
,/rag/query-rag
- Access protected Swagger UI & ReDoc after logging in
- Clients (HTTP/HTTPS, Python SDK, or UI Playground) make asynchronous HTTP requests.
- FastAPI Server — orchestration and business logic; validates requests and translates them to vector store operations.
- Vector Store — either Redis (HASH + HNSW/COSINE search) or Qdrant (collections + vector search).
- Metrics /
/status/ram
: Redis offersINFO memory
andmemory_stats()
— for Qdrant the same Redis-specific metrics are not available (the endpoint will return a short message explaining this). - Dtype handling: Server validates
dtype
for Redis (converts embeddings to the requested NumPy dtype). Qdrant accepts float arrays directly —dtype
is informational/compatibility metadata. - gRPC: Qdrant can be used over HTTP or gRPC (
prefer_grpc=true
in the config). Ensure your environment allows gRPC outbound/inbound as needed.
See the test/
directory for automated tests:
- client tests for the Python SDK
- API tests for endpoint behavior
test_deploy.py
for deployment / bootstrap validation