Book Keeper v2.0 - A comprehensive AI-powered quality assurance tool that analyzes PDF documents for contradictions, content flow, redundancy, code quality, and theoretical accuracy.
- π Contradiction Detection: Identifies logical contradictions between chapters
- π Flow Analysis: Checks content progression and prerequisite violations
- π Redundancy Detection: Finds duplicate or unnecessarily repeated content
- π Code Validation: Validates code snippets for syntax and best practices
- π Theory Verification: Verifies against software engineering standards (SOLID, Design Patterns, etc.)
- π Terminology Consistency: Ensures consistent use of technical terms throughout the document
- π Automatic PDF Chapter Extraction: Supports various chapter delimiter patterns
- π§ Vector Embeddings: Semantic text analysis using OpenAI Embeddings API
- ποΈ Vector Database: Efficient similarity search powered by Qdrant
- π€ Dual LLM Support: Choose between Claude 3.5 Sonnet (default) or GPT-4o
- π Comprehensive Reports: JSON and Markdown formats with quality scores
- π Multilingual Support: Works with PDFs in multiple languages
- Python 3.10+
- Conda (Anaconda or Miniconda)
- Docker (for running Qdrant)
- API Keys:
- ANTHROPIC_API_KEY (for Claude 3.5 Sonnet - default)
- OPENAI_API_KEY (for GPT-4o - optional)
git clone https://github.com/no-ai-labs/book-keeper.git
cd book-keeper
# Install Conda (if not already installed)
./install_conda.sh # macOS only
# Create and activate the environment
./setup.sh
conda activate book-keeper
cp .env_example .env
# Edit .env and add your API keys:
# - ANTHROPIC_API_KEY (for Claude - default)
# - OPENAI_API_KEY (for GPT-4o - optional)
docker-compose up -d
Put your PDF files in the pdf/
directory.
Runs all quality checks:
python rag_pdf_checker.py
Run only selected analyzers:
# Single check
python rag_pdf_checker.py --check contradiction
# Multiple checks
python rag_pdf_checker.py --check contradiction,flow,code
# Available checks: contradiction, flow, redundancy, code, theory, terminology
Limited analysis for testing (first 3 chapters only):
# Test with default model (Claude)
python rag_pdf_checker.py --test
# Test with GPT-4o
python rag_pdf_checker.py --test --openai
# Use Claude 3.5 Sonnet (default)
python rag_pdf_checker.py
# Use GPT-4o
python rag_pdf_checker.py --openai
# or
python rag_pdf_checker.py --gpt
python rag_pdf_checker.py --pdf-dir /path/to/pdfs
The tool generates two report files:
Detailed JSON report with all findings and scores.
Beautiful Markdown report with:
- π― Overall quality score (0-100%)
- π Individual module scores
- π Detailed findings by category
- π Key insights summary
- 90-100%: Excellent β
- 80-89%: Good
β οΈ - 70-79%: Fair
β οΈ - 60-69%: Needs Improvement β
- Below 60%: Poor β
book-keeper/
βββ rag_pdf_checker.py # Main v2.0 application
βββ analyzers/ # Analysis modules
β βββ base.py # Base analyzer class
β βββ contradiction.py # Contradiction detector
β βββ flow.py # Content flow analyzer
β βββ redundancy.py # Redundancy detector
β βββ code.py # Code validator
β βββ theory.py # Theory verifier
βββ pdf/ # PDF files directory
βββ environment.yml # Conda environment
βββ requirements.txt # Python dependencies
βββ docker-compose.yml # Qdrant setup
βββ quality_report_v2.* # Generated reports
# API Keys
ANTHROPIC_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here
# Qdrant Configuration
QDRANT_HOST=localhost
QDRANT_PORT=6345
QDRANT_API_KEY=optional_key
The docker-compose.yml
configures Qdrant with:
- REST API on port 6345
- gRPC on port 6346
- Persistent storage in
./data/qdrant
- Health checks
- Auto-restart
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with Claude 4 Opus and Claude 3.5 Sonnet
- Powered by OpenAI Embeddings
- Vector search by Qdrant