Skip to content

Commit ef13b2f

Browse files
committed
updated Documentation - Adithya S K
1 parent e7cf843 commit ef13b2f

File tree

5 files changed

+297
-73
lines changed

5 files changed

+297
-73
lines changed

README.md

Lines changed: 118 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,38 @@
1-
# 👁️👁️ VARAG (Vision Augmented Retrieval and Generation)
1+
# 👁️👁️ VARAG
2+
Vision Augmented Retrieval and Generation
3+
24

35
| ![VARAG](./docs/assets/llama.png)| VARAG (Vision-Augmented Retrieval and Generation) is a vision-first RAG engine that emphasizes vision-based retrieval techniques. It enhances traditional Retrieval-Augmented Generation (RAG) systems by integrating both visual and textual data through Vision-Language models. |
46
|:--:|:--|
57

6-
## Supported Retrieval Techniques
8+
[![GitHub Stars](https://img.shields.io/github/stars/adithya-s-k/VARAG?style=social)](https://github.com/adithya-s-k/VARAG/stargazers)
9+
[![GitHub Forks](https://img.shields.io/github/forks/adithya-s-k/VARAG?style=social)](https://github.com/adithya-s-k/VARAG/network/members)
10+
[![GitHub Issues](https://img.shields.io/github/issues/adithya-s-k/VARAG)](https://github.com/adithya-s-k/VARAG/issues)
11+
[![GitHub Pull Requests](https://img.shields.io/github/issues-pr/adithya-s-k/VARAG)](https://github.com/adithya-s-k/VARAG/pulls)
12+
[![License](https://img.shields.io/github/license/adithya-s-k/VARAG)](https://github.com/adithya-s-k/VARAG/blob/main/LICENSE)
13+
14+
15+
16+
### Supported Retrieval Techniques
17+
18+
VARAG supports a wide range of retrieval techniques, optimized for different use cases, including text, image, and multimodal document retrieval. Below are the primary techniques supported:
19+
20+
<details> <summary>Simple RAG (with OCR)</summary>
21+
Simple RAG (Retrieval-Augmented Generation) is an efficient and straightforward approach to extracting text from documents and feeding it into a retrieval pipeline. VARAG incorporates Optical Character Recognition (OCR) through Docling, making it possible to process and index scanned PDFs or images. After the text is extracted and indexed, queries can be matched to relevant passages in the document, providing a strong foundation for generating responses that are grounded in the extracted information. This technique is ideal for text-heavy documents like scanned books, contracts, and research papers, and can be paired with Large Language Models (LLMs) to produce contextually aware outputs.
722

8-
VARAG supports multiple retrieval techniques:
23+
</details> <details> <summary>Vision RAG</summary>
24+
Vision RAG extends traditional RAG techniques by incorporating the retrieval of visual information, bridging the gap between text and images. Using a powerful cross-modal embedding model like JinaCLIP (a variant of CLIP developed by Jina AI), both text and images are encoded into a shared vector space. This allows for similarity searches across different modalities, meaning that images can be queried alongside text. Vision RAG is particularly useful for document analysis tasks where visual components (e.g., figures, diagrams, images) are as important as the textual content. It’s also effective for tasks like image captioning or generating product descriptions where understanding and correlating text with visual elements is critical.
25+
26+
</details> <details> <summary>ColPali RAG</summary>
27+
ColPali RAG represents a cutting-edge approach that simplifies the traditional retrieval pipeline by directly embedding document pages as images rather than converting them into text. This method leverages PaliGemma, a Vision Language Model (VLM) from the Google Zürich team, which encodes entire document pages into vector embeddings, treating the page layout and visual elements as part of the retrieval process. Using a late interaction mechanism inspired by ColBERT (Column BERT), ColPali RAG enhances retrieval by enabling token-level matching between user queries and document patches. This approach ensures high retrieval accuracy while also maintaining reasonable indexing and querying speeds. It is particularly beneficial for documents rich in visuals, such as infographics, tables, and complex layouts, where conventional text-based retrieval methods struggle.
28+
29+
</details> <details> <summary>Hybrid ColPali RAG</summary>
30+
Hybrid ColPali RAG further enhances retrieval performance by combining the strengths of both image embeddings and ColPali’s late interaction mechanism. In this approach, the system first performs a coarse retrieval step using image embeddings (e.g., from a model like JinaCLIP) to retrieve the top-k relevant document pages. Then, in a second pass, the system re-ranks these k pages using the ColPali late interaction mechanism to identify the final set of most relevant pages based on both visual and textual information. This hybrid approach is particularly useful when documents contain a mixture of complex visuals and detailed text, allowing the system to leverage both content types for highly accurate document retrieval.
31+
32+
</details>
33+
34+
---
935

10-
- Colpali
11-
- Vision Encoder Based:
12-
- Seglip
13-
- CLIP
14-
- Jina CLIP
15-
- OCR-based Text RAG
1636

1737
## 🚀 Getting Started with VARAG
1838

@@ -40,33 +60,113 @@ Install the required packages using pip:
4060

4161
```bash
4262
pip install -e .
63+
64+
# or
65+
66+
poetry install
4367
```
4468

45-
## Running VARAG
69+
To install OCR dependencies:
4670

47-
### Demo
71+
```bash
72+
pip install -e .["ocr"]
73+
```
74+
75+
---
76+
77+
### Try Out VARAG
4878

49-
To run the demo:
79+
Explore VARAG with our interactive playground! It lets you seamlessly compare various RAG (Retrieval-Augmented Generation) solutions, from data ingestion to retrieval.
5080

81+
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/adithya-s-k/VARAG/blob/main/docs/demo.ipynb)
82+
83+
You can run it locally or on Google Colab:
5184
```bash
52-
python demo.py
85+
python demo.py --share
86+
```
87+
88+
This makes it easy to test and experiment with different approaches in real-time.
89+
90+
---
91+
92+
93+
### How VARAG is structured
94+
95+
Each RAG technique is structured as a class, abstracting all components and offering the following methods:
96+
97+
```python
98+
from varag.rag import {{RAGTechnique}}
99+
100+
ragTechnique = RAGTechnique()
101+
102+
ragTechnique.index(
103+
"/path_to_data_source",
104+
other_relevant_data
105+
)
106+
107+
results = ragTechnique.search("query", top_k=5)
108+
109+
# These results can be passed into the LLM / VLM of your choice
53110
```
54111

55112

113+
#### Why Abstract So Much?
114+
115+
I initially set out to rapidly test and evaluate different Vision-based RAG (Retrieval-Augmented Generation) systems to determine which one best fits my use case. I wasn’t aiming to create a framework or library, but it naturally evolved into one.
116+
117+
The abstraction is designed to simplify the process of experimenting with different RAG paradigms without complicating compatibility between components. To keep things straightforward, LanceDB was chosen as the vector store due to its ease of use and high customizability.
118+
119+
This paradigm is inspired by the [Byaldi](https://github.com/AnswerDotAI/byaldi) repo by Answer.ai.
120+
121+
---
122+
123+
### Techniques and Notebooks
124+
125+
| **Technique** | **Notebook** | **Demo** |
126+
|----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------|
127+
| **Simple RAG** | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/adithya-s-k/VARAG/blob/main/docs/simpleRAG.ipynb) | [simpleRAG.py](examples/textDemo.py) |
128+
| **Vision RAG** | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/adithya-s-k/VARAG/blob/main/docs/visionRAG.ipynb) | [visionDemo.py](examples/visionDemo.py) |
129+
| **Colpali RAG** | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/adithya-s-k/VARAG/blob/main/docs/colpaliRAG.ipynb) | [colpaliDemo.py](examples/colpaliDemo.py) |
130+
| **Hybrid Colpali RAG**| [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/adithya-s-k/VARAG/blob/main/docs/hybridColpaliRAG.ipynb) | [hybridColpaliDemo.py](examples/hybridColpaliDemo.py) |
131+
132+
---
133+
134+
### Explanation:
135+
136+
- **Technique**: This column lists the different techniques implemented for Retrieval-Augmented Generation (RAG).
137+
- **Notebook**: Colab links with the "Open In Colab" button for interactive exploration of each technique.
138+
- **Demo**: Links to the corresponding demo scripts in the repository that can be executed locally.
139+
140+
56141
## 🛠️ Contributing
57142

58143
Contributions to VARAG are highly encouraged! Whether it's code improvements, bug fixes, or feature enhancements, feel free to contribute to the project repository. Please adhere to the contribution guidelines outlined in the repository for smooth collaboration.
59144

145+
---
146+
60147
## 📜 License
61148

62149
VARAG is licensed under the [MIT License](https://opensource.org/licenses/MIT), granting you the freedom to use, modify, and distribute the code in accordance with the terms of the license.
63150

64-
## 🙏 Acknowledgments
151+
152+
153+
## Acknowledgments
65154

66155
We extend our sincere appreciation to the following projects and their developers:
67156

68-
- Docling - For PDF text extraction (OCR)
69-
- LanceDB - For Vector Database functionality
70-
- Developers of Surya, Marker, GPT-4 Vision, and various other tools and libraries that have played pivotal roles in the success of this project.
157+
- **Docling** - For PDF text extraction (OCR) and text extraction.
158+
- **LanceDB** - For vector database functionality.
159+
160+
This project also draws inspiration from the following repositories:
161+
162+
- [Byaldi](https://github.com/AnswerDotAI/byaldi)
163+
- [RAGatouille](https://github.com/AnswerDotAI/RAGatouille)
164+
165+
For the implementation of **Colpali**, we referred to the following blogs and codebases:
166+
167+
- [Vision Retrieval by Kyryl](https://github.com/kyryl-opens-ml/vision-retrieval)
168+
- [Vision Retrieval by AyushExel](https://github.com/AyushExel/vision-retrieval)
169+
- [The Rise of Vision-Driven Document Retrieval for RAG](https://blog.vespa.ai/the-rise-of-vision-driven-document-retrieval-for-rag/)
170+
171+
Additionally, we are grateful for the support of the open-source community and the invaluable feedback from users throughout the development journey.
71172

72-
Additionally, we are grateful for the support of the open-source community and the invaluable feedback from users during the development journey.

0 commit comments

Comments
 (0)