ACRES RAG Project

title	emoji	colorFrom	colorTo	sdk	sdk_version	app_file	pinned	license
ACRES: Center For Rapid Evidence Synthesis	👁	gray	pink	gradio	5.6.0	app.py	false	apache-2.0

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

ACRES RAG Project

Architecture

The ACRES RAG system follows a modular architecture designed for efficient document processing and information extraction:

%%{init: {'theme': 'default', 'themeVariables': { 'fontSize': '16px' }, 'flowchart': { 'htmlLabels': true, 'curve': 'basis', 'width': '100%', 'height': '100%' }}}%%
graph LR
    %% User Interaction Layer
    User((User)) --> WebUI[Web UI\nGradio Interface]
    User --> API[FastAPI\nREST API]
    
    subgraph "Input Stage"
        WebUI
        API
        subgraph "Input Sources"
            ZoteroLib[Zotero Library]
            PDFUpload[PDF Upload]
        end
        WebUI --> ZoteroLib
        WebUI --> PDFUpload
        API --> ZoteroLib
        API --> PDFUpload
    end

    subgraph "Processing Core"
        subgraph "Document Ingestion"
            ZoteroManager[Zotero Manager\nCollection & Items]
            PDFProcessor[PDF Processor\nFile Management]
            ZoteroLib --> ZoteroManager
            PDFUpload --> PDFProcessor
        end

        subgraph "Document Processing"
            TextExtraction[Text Extraction\n& Preprocessing]
            ChromaDB[(ChromaDB\nVector Store)]
            ZoteroManager --> TextExtraction
            PDFProcessor --> TextExtraction
            TextExtraction --> ChromaDB
        end

        subgraph "RAG Pipeline"
            QueryProcessing[Query Processing]
            Retrieval[Document Retrieval]
            LLMInference[LLM Inference\nOpenAI]
            
            ChromaDB --> Retrieval
            QueryProcessing --> Retrieval
            Retrieval --> LLMInference
        end

        subgraph "Variable Extraction"
            VariableParser[Variable Parser]
            DataFrameGen[DataFrame Generation]
            LLMInference --> VariableParser
            VariableParser --> DataFrameGen
        end
    end

    subgraph "Output Stage"
        CSVExport[CSV Export]
        DataFrameView[DataFrame View]
        DataFrameGen --> CSVExport
        DataFrameGen --> DataFrameView
    end

    CSVExport --> ACRESTeam[ACRES Team]
    DataFrameView --> ACRESTeam[ACRES Team]

    %% Styling
    classDef primary fill:#f9f,stroke:#333,stroke-width:4px
    classDef secondary fill:#bbf,stroke:#333,stroke-width:3px
    classDef storage fill:#dfd,stroke:#333,stroke-width:3px
    
    class WebUI,API primary
    class ZoteroManager,PDFProcessor,TextExtraction,QueryProcessing,Retrieval,LLMInference,VariableParser,DataFrameGen secondary
    class ChromaDB storage

Architecture Components

Input Stage
- Web UI: Gradio-based interface for user interactions
- REST API: FastAPI endpoints for programmatic access
- Input Sources: Supports both Zotero library integration and direct PDF uploads
Processing Core
- Document Ingestion: Handles document collection from various sources
- Document Processing: Extracts and preprocesses text from documents
- RAG Pipeline: Implements retrieval-augmented generation for accurate information extraction
- Variable Extraction: Parses and structures extracted information
Output Stage
- Provides structured data in CSV format
- Offers interactive DataFrame views
- Delivers processed data to ACRES team for analysis

Project Setup

To test and run the project locally. Clone the project from github and change directoory to acres.

git clone https://github.com/SunbirdAI/acres.git
cd acres

Create python virtual environment and activate it.

python -m venv env
source env/bin/activate

Install project dependencies

pip install -r requirements.txt

Run project locally

To test the project locally follow the steps below.

Copy .env.example to .env and provide the correct enviroment variable values.

cp .env.example .env

Run the application

python app.py

OR

gradio app.py

Browse the application with the link http://localhost:7860/

Run the api

Make sure the gradio app is running on port 7860 and then run the command below in another terminal tab in the same directory.

uvicorn api:app --reload

Browse the api at http://localhost:8000/docs

Run with docker

To run the application with docker locally, first make sure you have docker installed. See link

Build the project docker image

docker build -f Dockerfile.gradio -t gradio-app .

Create docker network

docker network create gradio-fastapi-network

Run the docker container

docker run -it -p 7860:7860 --rm --name gradio --network=gradio-fastapi-network gradio-app

Browse the application with the link http://localhost:7860/

To run the api with docker run the commands below. The gradio container should be run first before running the api.

docker build -f Dockerfile.api -t fastapi-app .
docker run -it -p 8000:8000 --rm --name fastapi --network=gradio-fastapi-network fastapi-app

Browse the api at http://localhost:8000/docs

Deploy to AWS ECS (Elastic Container Service) with Fargate

Install and configure the AWS CLI and aws credentials. See link

OR: See the pdf document here

Now follow the steps below to deploy to AWS ECS

Setup the default region and your aws account id

export AWS_DEFAULT_REGION=region # i.e us-east-1, eu-west-1
export AWS_ACCOUNT_ID=aws_account_id # ie. 2243838xxxxxx

Login into the AWS ECR (Elastic Container Registry) via the commandline

aws ecr get-login-password --region $AWS_DEFAULT_REGION | docker login --username AWS --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"

Create a python image and push to ECR. This image will be used as the base image for the application image deployed on AWS ECS.

Create python repository

aws ecr create-repository \
  --repository-name gradio-python \
  --image-tag-mutability MUTABLE

export ECR_PYTHON_URL="$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/gradio-python"
echo $ECR_PYTHON_URL

Pull python image and tag it to the ECR url

docker pull python:3.11.10-slim
docker tag python:3.11.10-slim $ECR_PYTHON_URL:3.11.10-slim

docker push $ECR_PYTHON_URL:3.11.10-slim

Now create application repostory

aws ecr create-repository \
  --repository-name gradio-app-prod \
  --image-tag-mutability MUTABLE

export ECR_BACKEND_GRADIO_URL="$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/gradio-app-prod"
echo $ECR_BACKEND_GRADIO_URL

Build the docker image for the production and push to ECR

docker build --build-arg AWS_ACCOUNT_ID=$AWS_ACCOUNT_ID -f Dockerfile.gradio.prod -t gradio-app-prod .
docker tag gradio-app-prod:latest "${ECR_BACKEND_GRADIO_URL}:latest"
docker push "${ECR_BACKEND_GRADIO_URL}:latest"

Now create fastapi repostory

aws ecr create-repository \
  --repository-name fastapi-api-prod \
  --image-tag-mutability MUTABLE

export ECR_BACKEND_FASTAPI_URL="$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/fastapi-api-prod"
echo $ECR_BACKEND_FASTAPI_URL

Build the docker image for the production and push to ECR

docker build --build-arg AWS_ACCOUNT_ID=$AWS_ACCOUNT_ID -f Dockerfile.api.prod -t fastapi-api-prod .
docker tag fastapi-api-prod:latest "${ECR_BACKEND_FASTAPI_URL}:latest"
docker push "${ECR_BACKEND_FASTAPI_URL}:latest"

Setup and Provision AWS ECS infra using AWS Cloudformation (IaC)

Install

To install the CFN-CLI run the command below

pip install cloudformation-cli cloudformation-cli-java-plugin cloudformation-cli-go-plugin cloudformation-cli-python-plugin cloudformation-cli-typescript-plugin

CFN-Toml

gem install cfn-toml

Copy infra/ecs_config.template to infra/ecs_config.toml and provide the correct AWS Account ID for the ContainerImageGradio

cp infra/ecs_config.template infra/ecs_config.toml

Deploy

To deploy the ECS infra run the commands below. It provisions the cloudformation stack changeset for review.

Log into your aws console and search for cloudformation. See and review the changeset. If everything is good execute the changeset to finish with the infra deployment.

Then look for the outputs to the link for the deployed application.

chmod u+x bin/cfn/*
./bin/cfn/ecs-deploy

Update Task Definition Deployments

After making changes, build the docker images and then push to ECR.

To update the task definition deployments, force the new deployment by running the commands below

For the gradio task definition

./bin/cfn/ecs-deploy-update-gradio

For the api task definition

./bin/cfn/ecs-deploy-update-api

Name		Name	Last commit message	Last commit date
Latest commit History 167 Commits
.gradio		.gradio
bin/cfn		bin/cfn
config		config
data		data
infra		infra
interface		interface
rag		rag
services		services
utils		utils
.dockerignore		.dockerignore
.env.example		.env.example
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.isort.cfg		.isort.cfg
Dockerfile.api		Dockerfile.api
Dockerfile.api.prod		Dockerfile.api.prod
Dockerfile.gradio		Dockerfile.gradio
Dockerfile.gradio.prod		Dockerfile.gradio.prod
Makefile		Makefile
README.md		README.md
api.py		api.py
api_endpoint_docs_plan.md		api_endpoint_docs_plan.md
app.py		app.py
aws-cli.pdf		aws-cli.pdf
commands.md		commands.md
config.py		config.py
docs.py		docs.py
gr_interface.py		gr_interface.py
pyproject.toml		pyproject.toml
refactor_plan.md		refactor_plan.md
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
sample_queries.md		sample_queries.md
study_files.json		study_files.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ACRES RAG Project

Architecture

Architecture Components

Project Setup

Run project locally

Run the api

Run with docker

Deploy to AWS ECS (Elastic Container Service) with Fargate

Setup and Provision AWS ECS infra using AWS Cloudformation (IaC)

Install

CFN-Toml

Deploy

Update Task Definition Deployments

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

SunbirdAI/data-extractor

Folders and files

Latest commit

History

Repository files navigation

ACRES RAG Project

Architecture

Architecture Components

Project Setup

Run project locally

Run the api

Run with docker

Deploy to AWS ECS (Elastic Container Service) with Fargate

Setup and Provision AWS ECS infra using AWS Cloudformation (IaC)

Install

CFN-Toml

Deploy

Update Task Definition Deployments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages