Skip to content

no-ai-labs/book-keeper

Repository files navigation

Book Keeper - PDF Quality Analyzer πŸ“šπŸ”

Book Keeper v2.0 - A comprehensive AI-powered quality assurance tool that analyzes PDF documents for contradictions, content flow, redundancy, code quality, and theoretical accuracy.

✨ Features

Core Analysis Modules

  1. πŸ“š Contradiction Detection: Identifies logical contradictions between chapters
  2. πŸ“Š Flow Analysis: Checks content progression and prerequisite violations
  3. πŸ” Redundancy Detection: Finds duplicate or unnecessarily repeated content
  4. πŸ› Code Validation: Validates code snippets for syntax and best practices
  5. πŸ“– Theory Verification: Verifies against software engineering standards (SOLID, Design Patterns, etc.)
  6. πŸ“ Terminology Consistency: Ensures consistent use of technical terms throughout the document

Technical Features

  • πŸ“„ Automatic PDF Chapter Extraction: Supports various chapter delimiter patterns
  • 🧠 Vector Embeddings: Semantic text analysis using OpenAI Embeddings API
  • πŸ—„οΈ Vector Database: Efficient similarity search powered by Qdrant
  • πŸ€– Dual LLM Support: Choose between Claude 3.5 Sonnet (default) or GPT-4o
  • πŸ“Š Comprehensive Reports: JSON and Markdown formats with quality scores
  • 🌍 Multilingual Support: Works with PDFs in multiple languages

πŸ“‹ System Requirements

  • Python 3.10+
  • Conda (Anaconda or Miniconda)
  • Docker (for running Qdrant)
  • API Keys:
    • ANTHROPIC_API_KEY (for Claude 3.5 Sonnet - default)
    • OPENAI_API_KEY (for GPT-4o - optional)

πŸš€ Quick Start

1. Clone the repository

git clone https://github.com/no-ai-labs/book-keeper.git
cd book-keeper

2. Set up the environment

# Install Conda (if not already installed)
./install_conda.sh  # macOS only

# Create and activate the environment
./setup.sh
conda activate book-keeper

3. Configure environment variables

cp .env_example .env
# Edit .env and add your API keys:
# - ANTHROPIC_API_KEY (for Claude - default)
# - OPENAI_API_KEY (for GPT-4o - optional)

4. Start Qdrant

docker-compose up -d

5. Place PDF files

Put your PDF files in the pdf/ directory.

πŸ“– Usage

Comprehensive Analysis (Default)

Runs all quality checks:

python rag_pdf_checker.py

Specific Checks

Run only selected analyzers:

# Single check
python rag_pdf_checker.py --check contradiction

# Multiple checks
python rag_pdf_checker.py --check contradiction,flow,code

# Available checks: contradiction, flow, redundancy, code, theory, terminology

Test Mode

Limited analysis for testing (first 3 chapters only):

# Test with default model (Claude)
python rag_pdf_checker.py --test

# Test with GPT-4o
python rag_pdf_checker.py --test --openai

Model Selection

# Use Claude 3.5 Sonnet (default)
python rag_pdf_checker.py

# Use GPT-4o
python rag_pdf_checker.py --openai
# or
python rag_pdf_checker.py --gpt

Custom PDF Directory

python rag_pdf_checker.py --pdf-dir /path/to/pdfs

πŸ“Š Output

The tool generates two report files:

1. quality_report_v2.json

Detailed JSON report with all findings and scores.

2. quality_report_v2.md

Beautiful Markdown report with:

  • 🎯 Overall quality score (0-100%)
  • πŸ“ˆ Individual module scores
  • πŸ“‹ Detailed findings by category
  • πŸ” Key insights summary

Quality Scoring

  • 90-100%: Excellent βœ…
  • 80-89%: Good ⚠️
  • 70-79%: Fair ⚠️
  • 60-69%: Needs Improvement ❌
  • Below 60%: Poor ❌

πŸ—‚οΈ Project Structure

book-keeper/
β”œβ”€β”€ rag_pdf_checker.py      # Main v2.0 application
β”œβ”€β”€ analyzers/              # Analysis modules
β”‚   β”œβ”€β”€ base.py            # Base analyzer class
β”‚   β”œβ”€β”€ contradiction.py   # Contradiction detector
β”‚   β”œβ”€β”€ flow.py           # Content flow analyzer
β”‚   β”œβ”€β”€ redundancy.py     # Redundancy detector
β”‚   β”œβ”€β”€ code.py           # Code validator
β”‚   └── theory.py         # Theory verifier
β”œβ”€β”€ pdf/                   # PDF files directory
β”œβ”€β”€ environment.yml        # Conda environment
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ docker-compose.yml     # Qdrant setup
└── quality_report_v2.*    # Generated reports

πŸ”§ Advanced Settings

Environment Variables (.env)

# API Keys
ANTHROPIC_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here

# Qdrant Configuration
QDRANT_HOST=localhost
QDRANT_PORT=6345
QDRANT_API_KEY=optional_key

Docker Compose Configuration

The docker-compose.yml configures Qdrant with:

  • REST API on port 6345
  • gRPC on port 6346
  • Persistent storage in ./data/qdrant
  • Health checks
  • Auto-restart

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

πŸ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Built with Claude 4 Opus and Claude 3.5 Sonnet
  • Powered by OpenAI Embeddings
  • Vector search by Qdrant

Releases

No releases published

Packages

No packages published