An offline semantic search system for research papers that works on a local area network without requiring internet connectivity.
This system enables users to:
- Upload and index research papers: Upload PDF research papers and automatically extract metadata and semantic embeddings.
- Search semantically: Find relevant papers based on natural language queries, not just keywords.
- Run completely offline: All processing happens locally with no internet requirement.
The system consists of two main components:
- PDF text extraction
- Metadata extraction (titles, authors, abstracts, etc.)
- Embedding generation using local LLM
- Vector database for semantic search
- Metadata database for filtering
- Paper search interface with filters
- Paper upload/ingestion UI
- Results display with relevance scores
- FastAPI: Web framework for APIs
- PyPDF2: PDF text extraction
- SentenceTransformers: Lightweight embedding model
- FAISS: Vector search engine
- SQLite: Metadata storage
- Uvicorn: ASGI server
- React: UI framework
- TailwindCSS: Styling
- Vite: Build tool
- Nginx: Static file serving and proxying
- Docker: Containerization
- Docker Compose: Service orchestration
- Docker and Docker Compose
- 8GB+ RAM (recommended for running the LLM)
- 10GB+ storage space
-
Clone the repository:
git clone https://github.com/ab1nash/paper-planes.git cd paper-planes
-
Start the system using Docker Compose:
sudo docker-compose up -d
-
Access the application:
- Web interface: http://localhost:3000
- API: http://localhost:8000/api/docs
When first running the system, it will:
- Download the lightweight LLM model (requires temporary internet connection or pre-downloaded model)
- Initialize the vector and metadata databases
- Create necessary storage directories
- Navigate to the "Upload Papers" tab
- Select a PDF file and click "Upload"
- The system will extract:
- Text content
- Metadata (title, authors, year, etc.)
- Semantic embeddings
- Optionally provide custom metadata if extraction fails
- Navigate to the "Search Papers" tab
- Enter a natural language query
- Optionally add filters:
- Publication year range
- Authors
- Keywords
- Conference/Journal
- View results sorted by relevance
- Expand papers to see abstracts and other details
- Download papers as needed
paper-planes/
├── backend/ # Python FastAPI application
│ ├── app/ # Application code
│ ├── models/ # LLM models directory
│ └── storage/ # Paper storage and databases
├── frontend/ # React application
│ ├── public/ # Static assets
│ └── src/ # React components and services
├── docker/ # Docker configuration
└── README.md # This documentation
The system can be configured through environment variables:
DEBUG
: Enable debug mode (default: false)UPLOAD_DIR
: Directory for storing papersLLM_MODEL_NAME
: Name of the embedding modelSIMILARITY_THRESHOLD
: Minimum similarity score for results
VITE_API_BASE_URL
: API endpoint URL
To use the system completely offline:
- Ensure the LLM model is downloaded during initial setup
- Configure your local network to allow connections to the server
- Connect devices to the same local network
- Access the system via the server's local IP address
For development:
-
Set up the backend:
cd backend python -m venv venv source venv/bin/activate # or venv\Scripts\activate on Windows pip install -r requirements.txt uvicorn app.main:app --reload
-
Set up the frontend:
cd frontend npm install npm run dev
- Add the filter field to the
SearchFilter
model inbackend/app/core/models.py
- Update the
_apply_filters
method inSearchService
- Add the UI component in
frontend/src/components/SearchForm.jsx
- Update the
LLM_MODEL_NAME
in configuration - Ensure the model is compatible with SentenceTransformers
- Update the
EMBEDDING_DIMENSION
to match the new model
This project is licensed under the GPLv3 License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.