Zotero RAG Search System

A powerful Retrieval-Augmented Generation (RAG) system for searching and retrieving PDF documents from your Zotero library using semantic similarity and natural language queries.

Features

🔍 Semantic Search: Uses sentence transformers for intelligent document retrieval
📚 Zotero Integration: Automatically extracts metadata from Zotero database
🚀 Fast Indexing: FAISS-powered similarity search for instant results
🌐 Browser Integration: Opens relevant PDFs directly in your browser
💬 Interactive Interface: Command-line interface with search suggestions
🏷️ Rich Metadata: Searches through titles, authors, abstracts, and content
💾 Persistent Index: Save and load search indices for faster startup

Installation

From PyPI (Recommended)

pip install zotero-rag

With GPU Support (Optional)

For faster embedding generation on CUDA-compatible GPUs:

pip install zotero-rag[gpu]

Development Installation

git clone https://github.com/yourusername/zotero-rag.git
cd zotero-rag
pip install -e .[dev]

Quick Start

Build the search index (first time only):
```
zotero-rag --build-index
```

Search your library:

zotero-rag --query "machine learning transformers"

Interactive mode:
```
zotero-rag
```

Usage Examples

Command Line Interface

# Build index and start interactive search
zotero-rag --build-index

# Direct search from command line
zotero-rag --query "deep learning ICLR 2023"

# Specify custom Zotero directory
zotero-rag --zotero-dir "/path/to/zotero" --build-index

# Save index for faster subsequent runs
zotero-rag --build-index --save-index my_library.json

# Load existing index
zotero-rag --load-index my_library.json --query "neural networks"

Interactive Mode

When you run zotero-rag without arguments, you get an interactive interface:

=== Zotero RAG Search System ===
Enter your search query (or 'quit' to exit)
Example: 'Proba Unlearn ICLR'

Search query: attention mechanisms transformers

Found 5 results:
------------------------------------------------------------
1. [0.847] Attention Is All You Need
   Author: Vaswani et al.
   Year: 2017
   Journal: NIPS
   File: attention_is_all_you_need.pdf

2. [0.783] BERT: Pre-training of Deep Bidirectional Transformers
   Author: Devlin et al.
   Year: 2019
   Journal: NAACL
   File: bert_pretraining.pdf

Enter number to open (or press Enter to search again): 1

Python API

from zotero_rag import ZoteroRAG

# Initialize the system
rag = ZoteroRAG()

# Build index
rag.build_index()

# Search
results = rag.search("quantum computing", top_k=5)

# Open best match
if results:
    rag.open_pdf(results[0][0]['path'])

Configuration

Zotero Directory Detection

The system automatically detects your Zotero directory on:

Windows: ~/Zotero or ~/AppData/Roaming/Zotero/
macOS: ~/Zotero or ~/Library/Application Support/Zotero/
Linux: ~/Zotero or ~/.zotero/

You can override this with --zotero-dir /custom/path.

Search Index Management

# Build and save index
zotero-rag --build-index --save-index ~/my_zotero_index.json

# Load existing index (much faster)
zotero-rag --load-index ~/my_zotero_index.json

# Update index with new papers
zotero-rag --build-index --save-index ~/my_zotero_index.json

How It Works

Discovery: Scans your Zotero storage directory for PDF files
Metadata Extraction: Reads bibliographic data from Zotero's SQLite database
Content Extraction: Extracts text from PDF files using PyPDF2
Embedding Generation: Creates semantic embeddings using sentence transformers
Index Building: Builds a FAISS index for fast similarity search
Query Processing: Converts your search query to embeddings
Retrieval: Finds most similar documents using cosine similarity
Presentation: Ranks and displays results with metadata

Performance Tips

First run: Building the index takes time (5-10 minutes for 1000+ papers)
Subsequent runs: Load saved indices for instant startup
GPU acceleration: Install with [gpu] extra for faster embedding generation
Memory usage: ~1-2GB RAM for moderate libraries (500-1000 papers)

Supported File Types

PDF files: Primary support with text extraction
Metadata: Title, authors, journal, year, abstract, DOI, tags
Zotero database: SQLite-based metadata extraction

Troubleshooting

Common Issues

"Zotero directory not found"

zotero-rag --zotero-dir "/path/to/your/zotero/directory"

"No PDF files found"
- Check that your Zotero library has PDF attachments
- Verify the storage directory exists: {zotero-dir}/storage/
"Database not found"
- Ensure Zotero is closed when running the indexer
- Check for zotero.sqlite in your Zotero directory
Memory issues with large libraries
- Process PDFs in batches
- Use --save-index to avoid rebuilding

Debug Mode

# Enable verbose output
zotero-rag --build-index --verbose

# Check detected paths
zotero-rag --info

Requirements

Python 3.8+
2GB+ RAM recommended
1GB+ free disk space for indices
Internet connection (first run only, for downloading models)

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this software in your research, please cite:

@software{zotero_rag,
  title={Zotero RAG Search System},
  author={Your Name},
  year={2024},
  url={https://github.com/yourusername/zotero-rag}
}

Acknowledgments

Sentence Transformers for semantic embeddings
FAISS for efficient similarity search
Zotero for reference management
The open-source community for making this possible

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
RELEASE_GUIDE.md		RELEASE_GUIDE.md
__init__.py		__init__.py
build_and_publish.sh		build_and_publish.sh
build_executable.py		build_executable.py
fast_pdf_opener.py		fast_pdf_opener.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
zotero-rag.spec		zotero-rag.spec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Zotero RAG Search System

Features

Installation

From PyPI (Recommended)

With GPU Support (Optional)

Development Installation

Quick Start

Usage Examples

Command Line Interface

Interactive Mode

Python API

Configuration

Zotero Directory Detection

Search Index Management

How It Works

Performance Tips

Supported File Types

Troubleshooting

Common Issues

Debug Mode

Requirements

Contributing

License

Citation

Acknowledgments

Support

About

Uh oh!

Releases 2

Packages

Languages

License

minhquoc0712/zotero-rag

Folders and files

Latest commit

History

Repository files navigation

Zotero RAG Search System

Features

Installation

From PyPI (Recommended)

With GPU Support (Optional)

Development Installation

Quick Start

Usage Examples

Command Line Interface

Interactive Mode

Python API

Configuration

Zotero Directory Detection

Search Index Management

How It Works

Performance Tips

Supported File Types

Troubleshooting

Common Issues

Debug Mode

Requirements

Contributing

License

Citation

Acknowledgments

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages