README

This is the repository for all the material of the Retrieval Augmented Generation training. Here you can find everything you need to deploy a simple RAG application based on the Solr search engine and an OpenAI LLM.

Requirements

To execute the project you need:

Docker
Python
An access to the OpenAI APIs

Repository content

chunking: contains the chunk.py Python script which takes in input a JSON file of Solr documents and produces a JSON file containing one Solr document for each generated chunk.
data: contains input data and related Python scripts.
- create.py: Python script to generate a JSON file of Solr documents.
- documents_10k.tsv: input text for Solr documents.
- solr_documents.json: JSON file of Solr documents.
- solr_documents_with_chunks.json: JSON file of Solr chunked documents.
docker-solr: contains the Docker file and the solr configuration.
neural: contains the LLM related classes.
solr-tools:
- index_documents.py: Python script to index documents in the Solr intance.
Solr.py: Solr class to instantiate a Solr instance, index documents and execute queries.
SolrRetriever.py: SolrRetriever class to generate and retrieve vectors from Solr.
config.yml: configuration file for the uvicorn application that implements RAG.
globals.py: global variables.
main.py: uvicorn application implementing the RAG pipeline.

Configuration

Before proceeding, change the configuration file accordingly to your use-case.

# Solr url where the data is indexed
solr_url: http://localhost:8984/solr/rag_index

# Server port where the langchain server runs
server_port: 8000

# If you set  allenai/scibert_scivocab_uncased it is going to download it at runtime.
# If you have already downloaded the model, you can set here the local path
pretrained_model_path: allenai/scibert_scivocab_uncased

# Min size (number of characters) of a chunk to be indexed 
# All the chunks with fewer characters will be discarded
min_chunk_size: 500

# RAG endpoint
# All the fields are mandatory
endpoint:
  - 
    # OpenAI model name
    model_name: "gpt-3.5-turbo-1106"
    # The name of the endpoint. The server will be run at the address http://URL:<server_port>/<endpoint_name>/playground
    endpoint_name: "gpt_3_5_rag"
    # Number of rows to retrieve for the knn phase
    knn_rows: 10 
    # Number of rows to retrieve for the bm25 phase
    bm25_rows: 10

Installation

To run Solr with Docker, follow the following instructions:

cd docker-solr;
docker-compose up;

Solr will be available at https://localhost:8984/

To run Python scripts, install the requirements:

pip install -r requirements.txt;

Generate documents

You can skip this step if you want to use the already provided material. To generate documents:

cd data;
python create.py

To generate chunked documents:

cd chunking;
python chunk.py

Index documents

cd solr-tools;
python index_documents.py "<path to>/RAG-training/data/solr_documents_with_chunks.json"

Run RAG server

export OPENAI_API_KEY="<your-api-key-here>"
python main.py

Usage

Start Solr:

cd docker-solr;
docker-compose up;

Start the RAG server:

python main.py

Make queries at: http://0.0.0.0:8000/gpt_3_5_rag/playground/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

README

Requirements

Repository content

Configuration

Installation

Generate documents

Index documents

Run RAG server

Usage

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
chunking		chunking
data		data
docker-solr		docker-solr
neural		neural
solr-tools		solr-tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Solr.py		Solr.py
SolrRetriever.py		SolrRetriever.py
config.yml		config.yml
globals.py		globals.py
main.py		main.py
requirements.txt		requirements.txt

License

SeaseLtd/RAG-training

Folders and files

Latest commit

History

Repository files navigation

README

Requirements

Repository content

Configuration

Installation

Generate documents

Index documents

Run RAG server

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages