RAG based Docuemnt Management System

Overview

This project processes and indexes text documents using FAISS for efficient similarity search. It supports embedding entire documents and individual sentences for fine-grained search queries about specific information on any document.

Features

Detects file encoding and extracts text from PDFs and other text files.
Cleans and tokenizes text into sentences.
Generates and stores embeddings for documents and sentences.
Uses FAISS for efficient similarity searches.
Stores document metadata and embeddings in SQLite.

Setup Instructions

Prerequisites

Python 3.10+
NOTE: this uses the faiss-gpu package, which may not work if you have a gpu that doesnt support CUDA. Consider switching to the faiss-cpu package instead.

Installation

Clone the repository:

git clone https://github.com/yourusername/yourproject.git
cd yourproject

Install dependencies:
```
make install-all
```

Usage

1. Parsing a Document

Use parse_doc(file_path) to extract and clean text from a document.

2. Saving Embeddings

Call save_embedding(doc_id, doc_name, embedding, text, sentence_embeddings) to store document embeddings in the database and FAISS index.

3. Querying for Similar Documents

Use FAISS to search for similar documents:

query_embedding = model.encode("your query text")
D, I = doc_index.search(query_embedding.reshape(1, -1), k=5)

Running the Project Locally

Running the Backend

Navigate to the backend directory:
```
cd backend
```

Start the backend server:

uvicorn main:app --host 0.0.0.0 --port 8000 --reload

OR

Run with the make file:
```
make run-backend
```

Running the Frontend

Navigate to the frontend directory:
```
cd frontend
```
Build the frontend:
```
npm run build
```
Preview the frontend:
```
npm run preview
```

Running both at the same time (requires you to build the frontend first )

In the root directory, run:
```
make start
```

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG based Docuemnt Management System

Overview

Features

Setup Instructions

Prerequisites

Installation

Usage

1. Parsing a Document

2. Saving Embeddings

3. Querying for Similar Documents

Running the Project Locally

Running the Backend

Running the Frontend

Running both at the same time (requires you to build the frontend first )

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

cruzjuan298/NeuraDocs

Folders and files

Latest commit

History

Repository files navigation

RAG based Docuemnt Management System

Overview

Features

Setup Instructions

Prerequisites

Installation

Usage

1. Parsing a Document

2. Saving Embeddings

3. Querying for Similar Documents

Running the Project Locally

Running the Backend

Running the Frontend

Running both at the same time (requires you to build the frontend first )

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages