Book Keeper - PDF Quality Analyzer 📚🔍

Book Keeper v2.0 - A comprehensive AI-powered quality assurance tool that analyzes PDF documents for contradictions, content flow, redundancy, code quality, and theoretical accuracy.

✨ Features

Core Analysis Modules

📚 Contradiction Detection: Identifies logical contradictions between chapters
📊 Flow Analysis: Checks content progression and prerequisite violations
🔁 Redundancy Detection: Finds duplicate or unnecessarily repeated content
🐛 Code Validation: Validates code snippets for syntax and best practices
📖 Theory Verification: Verifies against software engineering standards (SOLID, Design Patterns, etc.)
📝 Terminology Consistency: Ensures consistent use of technical terms throughout the document

Technical Features

📄 Automatic PDF Chapter Extraction: Supports various chapter delimiter patterns
🧠 Vector Embeddings: Semantic text analysis using OpenAI Embeddings API
🗄️ Vector Database: Efficient similarity search powered by Qdrant
🤖 Dual LLM Support: Choose between Claude 3.5 Sonnet (default) or GPT-4o
📊 Comprehensive Reports: JSON and Markdown formats with quality scores
🌍 Multilingual Support: Works with PDFs in multiple languages

📋 System Requirements

Python 3.10+
Conda (Anaconda or Miniconda)
Docker (for running Qdrant)
API Keys:
- ANTHROPIC_API_KEY (for Claude 3.5 Sonnet - default)
- OPENAI_API_KEY (for GPT-4o - optional)

🚀 Quick Start

1. Clone the repository

git clone https://github.com/no-ai-labs/book-keeper.git
cd book-keeper

2. Set up the environment

# Install Conda (if not already installed)
./install_conda.sh  # macOS only

# Create and activate the environment
./setup.sh
conda activate book-keeper

3. Configure environment variables

cp .env_example .env
# Edit .env and add your API keys:
# - ANTHROPIC_API_KEY (for Claude - default)
# - OPENAI_API_KEY (for GPT-4o - optional)

4. Start Qdrant

docker-compose up -d

5. Place PDF files

Put your PDF files in the pdf/ directory.

📖 Usage

Comprehensive Analysis (Default)

Runs all quality checks:

python rag_pdf_checker.py

Specific Checks

Run only selected analyzers:

# Single check
python rag_pdf_checker.py --check contradiction

# Multiple checks
python rag_pdf_checker.py --check contradiction,flow,code

# Available checks: contradiction, flow, redundancy, code, theory, terminology

Test Mode

Limited analysis for testing (first 3 chapters only):

# Test with default model (Claude)
python rag_pdf_checker.py --test

# Test with GPT-4o
python rag_pdf_checker.py --test --openai

Model Selection

# Use Claude 3.5 Sonnet (default)
python rag_pdf_checker.py

# Use GPT-4o
python rag_pdf_checker.py --openai
# or
python rag_pdf_checker.py --gpt

Custom PDF Directory

python rag_pdf_checker.py --pdf-dir /path/to/pdfs

📊 Output

The tool generates two report files:

1. `quality_report_v2.json`

Detailed JSON report with all findings and scores.

2. `quality_report_v2.md`

Beautiful Markdown report with:

🎯 Overall quality score (0-100%)
📈 Individual module scores
📋 Detailed findings by category
🔍 Key insights summary

Quality Scoring

90-100%: Excellent ✅
80-89%: Good ⚠️
70-79%: Fair ⚠️
60-69%: Needs Improvement ❌
Below 60%: Poor ❌

🗂️ Project Structure

book-keeper/
├── rag_pdf_checker.py      # Main v2.0 application
├── analyzers/              # Analysis modules
│   ├── base.py            # Base analyzer class
│   ├── contradiction.py   # Contradiction detector
│   ├── flow.py           # Content flow analyzer
│   ├── redundancy.py     # Redundancy detector
│   ├── code.py           # Code validator
│   └── theory.py         # Theory verifier
├── pdf/                   # PDF files directory
├── environment.yml        # Conda environment
├── requirements.txt       # Python dependencies
├── docker-compose.yml     # Qdrant setup
└── quality_report_v2.*    # Generated reports

🔧 Advanced Settings

Environment Variables (.env)

# API Keys
ANTHROPIC_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here

# Qdrant Configuration
QDRANT_HOST=localhost
QDRANT_PORT=6345
QDRANT_API_KEY=optional_key

Docker Compose Configuration

The docker-compose.yml configures Qdrant with:

REST API on port 6345
gRPC on port 6346
Persistent storage in ./data/qdrant
Health checks
Auto-restart

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built with Claude 4 Opus and Claude 3.5 Sonnet
Powered by OpenAI Embeddings
Vector search by Qdrant

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
analyzers		analyzers
docker-compose		docker-compose
.env_example		.env_example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
analyze_chapters.py		analyze_chapters.py
environment.yml		environment.yml
install_conda.sh		install_conda.sh
monitor_results.py		monitor_results.py
rag_pdf_checker.py		rag_pdf_checker.py
rag_pdf_checker_claude.py		rag_pdf_checker_claude.py
requirements.txt		requirements.txt
run_test.sh		run_test.sh
setup.sh		setup.sh
show_results.py		show_results.py
test_claude4.py		test_claude4.py
test_system.py		test_system.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Book Keeper - PDF Quality Analyzer 📚🔍

✨ Features

Core Analysis Modules

Technical Features

📋 System Requirements

🚀 Quick Start

1. Clone the repository

2. Set up the environment

3. Configure environment variables

4. Start Qdrant

5. Place PDF files

📖 Usage

Comprehensive Analysis (Default)

Specific Checks

Test Mode

Model Selection

Custom PDF Directory

📊 Output

1. `quality_report_v2.json`

2. `quality_report_v2.md`

Quality Scoring

🗂️ Project Structure

🔧 Advanced Settings

Environment Variables (.env)

Docker Compose Configuration

🤝 Contributing

📜 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

no-ai-labs/book-keeper

Folders and files

Latest commit

History

Repository files navigation

Book Keeper - PDF Quality Analyzer 📚🔍

✨ Features

Core Analysis Modules

Technical Features

📋 System Requirements

🚀 Quick Start

1. Clone the repository

2. Set up the environment

3. Configure environment variables

4. Start Qdrant

5. Place PDF files

📖 Usage

Comprehensive Analysis (Default)

Specific Checks

Test Mode

Model Selection

Custom PDF Directory

📊 Output

1. quality_report_v2.json

2. quality_report_v2.md

Quality Scoring

🗂️ Project Structure

🔧 Advanced Settings

Environment Variables (.env)

Docker Compose Configuration

🤝 Contributing

📜 License

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. `quality_report_v2.json`

2. `quality_report_v2.md`

Packages