RAG Best Practice on Vietnamese

Evaluation Framework

Retrieval Benchmarks

ReRank Benchmarks

LLM Answer Benchmarks

Groundedness Benchmarks

Groundedness measures how well a model’s responses are supported by the provided context or reliable sources, ensuring accuracy and reducing hallucinations.

View details about this benchmarks here

Slides

📑 Slide

Demo

▶️ Video Demo

Chatbot Architecture

The chatbot can retrieve your product data and answer related questions:

It can also handle casual conversations using Semantic Router:

Opensource client via Docker

We've open-sourced a chatbot client via Docker.

# Pull the Docker image
docker pull protonx/protonx-open-source:protonx-chat-client-v01

# Run the Docker image
docker run -p 3002:3000 -e RAG_BACKEND_URL=${YOUR_BACKEND_URL} protonx/protonx-open-source:protonx-chat-client-v01

If your local backend URL is http://localhost:5002/api/search, the command will be:

docker run -p 3002:3000 -e RAG_BACKEND_URL="http://localhost:5002/api/search" protonx/protonx-open-source:protonx-chat-client-v01

The backend should accept a POST request with the following request body:

[
  {
    "role": "user",
    "content": "Tôi đang tham khảo redmi note 13 plus",
  }
]

And return a response in the following format:

{
  "role": "assistant",
  "content": "Xin chào! Cảm ơn bạn đã quan tâm đến sản phẩm của chúng tôi. Điện thoại Redmi Note 13 Pro+ là một lựa chọn tuyệt vời..."
}

Setup

1. Installation

Requires Python >= 3.12

pip install -r requirements.txt

2. Environment Variables

Create a .env file and add the following:

# MongoDB vector database (leave blank if not used)
MONGODB_URI=
DB_NAME=
DB_COLLECTION=

# Qdrant vector database (leave blank if not used)
QDRANT_API=
QDRANT_URL=

# Gemini LLM (leave blank if not used)
GEMINI_API_KEY=

# OpenAI LLM (leave blank if not used)
OPENAI_API_KEY=

# Together AI LLM (leave blank if not used)
TOGETHER_API_KEY=
TOGETHER_BASE_URL=

# Ollama local LLM engine (leave blank if not used)
OLLAMA_BASE_URL=

# vLLM local LLM engine (leave blank if not used)
VLLM_BASE_URL=

3. Data Preparation

Prepare your data as shown below:

Make sure to create a Vector Search Index in MongoDB Atlas. 🎥 Watch how to do it

Guide for Qdrant will be updated soon

4. Customize Your Prompt

In serve.py, you can customize the LLM prompt like this:

f"Hãy trở thành chuyên gia tư vấn bán hàng cho một cửa hàng điện thoại. Câu hỏi của khách hàng: {query}\nTrả lời câu hỏi dựa vào các thông tin sản phẩm dưới đây: {source_information}."

Example full prompt:

Hãy trở thành chuyên gia tư vấn bán hàng cho một cửa hàng điện thoại. Câu hỏi của khách hàng: Samsung Galaxy Z Fold4 512GB
Trả lời câu hỏi dựa vào các thông tin sản phẩm dưới đây: 
1) Tên: điện thoại samsung galaxy z fold5 12gb/512gb - chính hãng, Giá: 30,990,000 ₫, Ưu đãi:
   - KM 1: Tặng gói Samsung care+ 6 tháng
   - KM 2: Trả góp tới 06 tháng không lãi suất, trả trước 0 đồng với Samsung Finance+.

2) Tên: điện thoại ai - samsung galaxy s24 - 8gb/512gb - chính hãng, Giá: 25,490,000 ₫, Ưu đãi:
   - KM 1: Trả góp tới 06 tháng không lãi suất, trả trước 0 đồng với Samsung Finance+.
   - KM 2: Giảm thêm 1.000.000đ cho khách hàng thân thiết (Chi tiết LH 1900 ****)

3) Tên: điện thoại samsung galaxy s23 ultra 12gb/512gb - chính hãng, Giá: 26,490,000 ₫, Ưu đãi:
   - KM 1: Trả góp tới 06 tháng không lãi suất, trả trước 0 đồng với Samsung Finance+.

5. Run the Server

Use openai model online

python serve.py --mode online --model_name openai --model_version gpt-4o

Use Gemini model in online mode

python serve.py --mode online --model_name gemini --model_version gemini-2.0-flash

Run Ollama with the local model mistralai/Mistral-7B-Instruct-v0.2

python serve.py --mode offline --model_engine ollama --model_version mistralai/Mistral-7B-Instruct-v0.2

Run HuggingFace backend with the local model mistralai/Mistral-7B-Instruct-v0.2

python serve.py --mode offline --model_engine huggingface --model_version mistralai/Mistral-7B-Instruct-v0.2

Run ONNX backend with the local model

python serve.py --mode offline --model_name TinyLLama --model_engine onnx --model_version onnx-community/TinyLLama-v0-ONNX

6. Test the API

Try the chatbot UI here: 🔗 GitHub: protonx-ai-app-UI

7. Run Evaluation

Run all evaluation tests:

python -m unittest discover -s ./test/integrationTest -p "test*.py" -v

Run a specific test:

python ./test/integrationTest/llm-answer/test_bleu.py

Current evaluation

Integration Test
- LLM Answer
  - BLEU test
  - ROUGE test
Unit Test
- Test vector search
  - Retrieval
  - Hit@K
- Rerank
  - nCDG
- Test Reflection

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
data		data
embeddings		embeddings
insert_data		insert_data
llms		llms
rag		rag
re_rank		re_rank
reflection		reflection
semantic_router		semantic_router
test		test
.gitignore		.gitignore
README.md		README.md
benchmark_inference_time_llms.py		benchmark_inference_time_llms.py
benchmark_inference_time_pipeline.py		benchmark_inference_time_pipeline.py
index.html		index.html
requirements.txt		requirements.txt
serve.py		serve.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG Best Practice on Vietnamese

Evaluation Framework

Slides

Demo

Chatbot Architecture

The chatbot can retrieve your product data and answer related questions:

It can also handle casual conversations using Semantic Router:

Opensource client via Docker

Setup

1. Installation

2. Environment Variables

3. Data Preparation

4. Customize Your Prompt

5. Run the Server

Use openai model online

Use Gemini model in online mode

Run Ollama with the local model mistralai/Mistral-7B-Instruct-v0.2

Run HuggingFace backend with the local model mistralai/Mistral-7B-Instruct-v0.2

Run ONNX backend with the local model

6. Test the API

7. Run Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Languages

bangoc123/retrieval-backend-with-rag

Folders and files

Latest commit

History

Repository files navigation

RAG Best Practice on Vietnamese

Evaluation Framework

Slides

Demo

Chatbot Architecture

The chatbot can retrieve your product data and answer related questions:

It can also handle casual conversations using Semantic Router:

Opensource client via Docker

Setup

1. Installation

2. Environment Variables

3. Data Preparation

4. Customize Your Prompt

5. Run the Server

Use openai model online

Use Gemini model in online mode

Run Ollama with the local model mistralai/Mistral-7B-Instruct-v0.2

Run HuggingFace backend with the local model mistralai/Mistral-7B-Instruct-v0.2

Run ONNX backend with the local model

6. Test the API

7. Run Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Languages

Packages