pip install -q langchain-openai langchain playwright beautifulsoup4 langchain-google-genai pydantic structlog google-generativeai fastembed langchain-core langchain_community tiktoken nest-asyncio
playwright install
Flare AI Kit template for Retrieval-Augmented Generation (RAG) Knowledge.
- Modular Architecture: Designed with independent components that can be easily extended.
- Qdrant-Powered Retrieval: Leverages Qdrant for fast, semantic document retrieval, but can easily be adapted to other vector databases.
- Highly Configurable & Extensible: Uses a straightforward configuration system, enabling effortless integration of new features and services.
- Unified LLM Integration: Leverages Gemini as a unified provider while maintaining compatibility with OpenRouter for a broader range of models.
Before getting started, ensure you have:
- A Python 3.12 environment.
- uv installed for dependency management.
- Docker
- A Gemini API key.
- Access to one of the Flare databases. (The Flare Developer Hub is included in CSV format for local testing.)
You can deploy Flare AI RAG using Docker or set up the backend and frontend manually.
- Prepare the Environment File:
Rename
.env.example
to.env
and update the variables accordingly. (e.g. your Gemini API key)
-
Build the Docker Image:
docker build -t flare-ai-rag .
-
Run the Docker Container:
docker run -p 80:80 -it --env-file .env flare-ai-rag
-
Access the Frontend: Open your browser and navigate to http://localhost:80 to interact with the Chat UI.
Flare AI RAG is composed of a Python-based backend and a JavaScript frontend. Follow these steps for manual setup:
-
Install Dependencies: Use uv to install backend dependencies:
uv sync --all-extras
-
Setup a Qdrant Service: Make sure that Qdrant is up an running before running your script. You can quickly start a Qdrant instance using Docker:
docker run -p 6333:6333 qdrant/qdrant
-
Start the Backend: The backend runs by default on
0.0.0.0:8080
:uv run start-backend
-
Install Dependencies: In the
chat-ui/
directory, install the required packages using npm:cd chat-ui/ npm install
-
Configure the Frontend: Update the backend URL in
chat-ui/src/App.js
for testing:const BACKEND_ROUTE = "http://localhost:8080/api/routes/chat/";
Note: Remember to change
BACKEND_ROUTE
back to'api/routes/chat/'
after testing. -
Start the Frontend:
npm start
src/flare_ai_rag/
βββ ai/ # AI Provider implementations
β βββ base.py # Abstract base classes
β βββ gemini.py # Google Gemini integration
β βββ model.py # Model definitions
β βββ openrouter.py # OpenRouter integration
βββ api/ # API layer
β βββ middleware/ # Request/response middleware
β βββ routes/ # API endpoint definitions
βββ attestation/ # TEE security layer
β βββ simulated_token.txt
β βββ vtpm_attestation.py # vTPM client
β βββ vtpm_validation.py # Token validation
βββ prompts/ # AI system prompts & templates
β βββ library.py # Prompt module library
β βββ schemas.py # Schema definitions
β βββ service.py # Prompt service module
β βββ templates.py # Prompt templates
βββ responder/ # Response generation
β βββ base.py # Base responder interface
β βββ config.py # Response configuration
β βββ prompts.py # System prompts
β βββ responder.py # Main responder logic
βββ retriever/ # Document retrieval
β βββ base.py # Base retriever interface
β βββ config.py # Retriever configuration
β βββ qdrant_collection.py # Qdrant collection management
β βββ qdrant_retriever.py # Qdrant implementation
βββ router/ # API routing
β βββ base.py # Base router interface
β βββ config.py # Router configuration
β βββ prompts.py # Router prompts
β βββ router.py # Main routing logic
βββ utils/ # Utility functions
β βββ file_utils.py # File operations
β βββ parser_utils.py # Input parsing
βββ input_parameters.json # Configuration parameters
βββ main.py # Application entry point
βββ query.txt # Sample queries
βββ settings.py # Environment settings
Deploy on a Confidential Space using AMD SEV.
-
Google Cloud Platform Account: Access to the
verifiable-ai-hackathon
project is required. -
Gemini API Key: Ensure your Gemini API key is linked to the project.
-
gcloud CLI: Install and authenticate the gcloud CLI.
-
Set Environment Variables: Update your
.env
file with:TEE_IMAGE_REFERENCE=ghcr.io/flare-foundation/flare-ai-rag:main # Replace with your repo build image INSTANCE_NAME=<PROJECT_NAME-TEAM_NAME>
-
Load Environment Variables:
source .env
Reminder: Run the above command in every new shell session or after modifying
.env
. On Windows, we recommend using git BASH to access commands likesource
. -
Verify the Setup:
echo $TEE_IMAGE_REFERENCE # Expected output: Your repo build image
Run the following command:
gcloud compute instances create $INSTANCE_NAME \
--project=verifiable-ai-hackathon \
--zone=us-east5-b \
--machine-type=n2d-standard-2 \
--network-interface=network-tier=PREMIUM,nic-type=GVNIC,stack-type=IPV4_ONLY,subnet=default \
--metadata=tee-image-reference=$TEE_IMAGE_REFERENCE,\
tee-container-log-redirect=true,\
tee-env-GEMINI_API_KEY=$GEMINI_API_KEY,\
--maintenance-policy=MIGRATE \
--provisioning-model=STANDARD \
--service-account=confidential-sa@verifiable-ai-hackathon.iam.gserviceaccount.com \
--scopes=https://www.googleapis.com/auth/cloud-platform \
--min-cpu-platform="AMD Milan" \
--tags=flare-ai,http-server,https-server \
--create-disk=auto-delete=yes,\
boot=yes,\
device-name=$INSTANCE_NAME,\
image=projects/confidential-space-images/global/images/confidential-space-debug-250100,\
mode=rw,\
size=11,\
type=pd-standard \
--shielded-secure-boot \
--shielded-vtpm \
--shielded-integrity-monitoring \
--reservation-affinity=any \
--confidential-compute-type=SEV
-
After deployment, you should see an output similar to:
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS rag-team1 us-central1-b n2d-standard-2 10.128.0.18 34.41.127.200 RUNNING
-
It may take a few minutes for Confidential Space to complete startup checks. You can monitor progress via the GCP Console logs. Click on Compute Engine β VM Instances (in the sidebar) β Select your instance β Serial port 1 (console).
When you see a message like:
INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
the container is ready. Navigate to the external IP of the instance (visible in the VM Instances page) to access the Chat UI.
If you encounter issues, follow these steps:
-
Check Logs:
gcloud compute instances get-serial-port-output $INSTANCE_NAME --project=verifiable-ai-hackathon
-
Verify API Key(s): Ensure that all API Keys are set correctly (e.g.
GEMINI_API_KEY
). -
Check Firewall Settings: Confirm that your instance is publicly accessible on port
80
.
Design and implement a knowledge ingestion pipeline, with a demonstration interface showing practical applications for developers and users.
N.B. Other vector databases can be used, provided they run within the same Docker container as the RAG system, since the deployment will occur in a TEE.
- Enhanced Data Ingestion & Indexing: Explore more sophisticated data structures for improved indexing and retrieval, and expand beyond a CSV format to include additional data sources (e.g., Flare's GitHub, blogs, documentation). BigQuery integration would be desirable.
- Intelligent Query & Data Processing: Use recommended AI models to refine the data processing pipeline, including pre-processing steps that optimize and clean incoming data, ensuring higher-quality context retrieval. (e.g. Use an LLM to reformulate or expand user queries before passing them to the retriever, improving the precision and recall of the semantic search.)
- Advanced Context Management: Develop an intelligent context management system that:
- Implements Dynamic Relevance Scoring to rank documents by their contextual importance.
- Optimizes the Context Window to balance the amount of information sent to LLMs.
- Includes Source Verification Mechanisms to assess and validate the reliability of the data sources.
- Improved Retrieval & Response Pipelines: Integrate hybrid search techniques (combining semantic and keyword-based methods) for better retrieval, and implement completion checks to verify that the responder's output is complete and accurate (potentially allow an iterative feedback loop for refining the final answer).