RAG Bootcamp

This is a collection of reference implementations for Vector Institute's RAG (Retrieval-Augmented Generation) Bootcamp, that took place from Nov 2024 to Jan 2025. It demonstrates some of the common methodologies used in RAG workflows (data ingestion, chunks, embeddings, vector databases, sparse/dense retrieval, reranking) using the popular Python LangChain and LlamaIndex libraries.

Reference Implementations

This repository includes several reference implementations showing different approaches and methodologies related to Retrieval-Augmented Generation.

Web Search: Popular LLMs like OpenAI's GPT-4o and Meta's Llama-3 are very good at processing natural language, but their knowledge is limited by the data they were trained on. As of November 2024, neither service can correctly answer the question "Who won the 2024 World Series of Baseball?"
Document Search: Use a collection of unstructured documents to answer domain-specific questions, like: "How many AI scholarships did Vector Institute award in 2022?"
SQL Search: Answer natural language questions with information from structured relational data. This demo uses a financial dataset from a Portugese banking instituation, available on Kaggle
Cloud Search: Retrieve information from data in a cloud service, in this example AWS S3 storage
PubMed QA: A full pipeline on the PubMed dataset demonstrating ingestion, embeddings, vector index/storage, retrieval, reranking, with a focus on evaluation metrics.
RAG Evaluation: RAG evaluation techniques based on the Ragas framework. Focuses on evaluation "test sets" and how to use these to determine how well a RAG pipeline is actually working.

Requirements

Python 3.10+

Git Repostory

Start by cloning this git repository to a local folder:

git clone https://github.com/VectorInstitute/rag-bootcamp

Setup Instructions

Follow these steps to set up your environment for the RAG Bootcamp notebooks:

Install uv:
```
pip install uv
```
Create and activate a virtual environment using uv:
```
uv venv .venv
source .venv/bin/activate
```
Install dependencies using uv:
```
uv sync --dev
```
Configure environment variables:

Copy the example environment file and update it with your settings:
```
cp .env.example .env
# Edit .env and add all required environment variables
```

Install the Jupyter kernel:

uv run ipython kernel install --user --name=rag-bootcamp

Start Jupyter Lab with environment variables loaded:
```
uv run --env-file .env jupyter lab
```

You are now ready to use the RAG Bootcamp notebooks!

Opening Notebooks in Google Colab

Each notebook in this repository includes an "Open in Colab" badge at the top. To run a notebook in Google Colab:

Navigate to the desired notebook in the GitHub repository.
Click the "Open in Colab" badge at the top of the notebook.
The notebook will open in Google Colab, where you can run the code interactively or make a copy to make changes to it.
Select T4 GPU as the runtime type.

Note: Some features (such as access to local files or environment variables) may require additional configuration or may not be fully supported in Colab. For best results, review any instructions provided in the notebook itself.

Name		Name	Last commit message	Last commit date
Latest commit History 249 Commits
.github/workflows		.github/workflows
aieng-rag-utils		aieng-rag-utils
implementations		implementations
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE.md		LICENSE.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG Bootcamp

Reference Implementations

Requirements

Git Repostory

Setup Instructions

Opening Notebooks in Google Colab

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 7

Uh oh!

Languages

License

VectorInstitute/rag-bootcamp

Folders and files

Latest commit

History

Repository files navigation

RAG Bootcamp

Reference Implementations

Requirements

Git Repostory

Setup Instructions

Opening Notebooks in Google Colab

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 7

Uh oh!

Languages

Packages