Skip to content

sohan2311/Automated-Book-Publication-Workflow

Repository files navigation

πŸ“š Automated Book Publication Workflow

Python Gradio AI Powered

An advanced AI-driven content processing pipeline that automates the entire book publication workflow - from web scraping to publication-ready content. Built with Python, Gradio, and Google's Gemini AI.

πŸŽ₯ Video Demonstration

Watch the Demo

Click the badge above to watch a complete walkthrough of the Book Publication Workflow in action!

🎯 Features

πŸ”„ Complete Pipeline Automation

  • Web Scraping: Extract content from any web source with screenshot capture
  • AI Processing: Multi-iteration content enhancement using Google Gemini
  • Human Editing: Intuitive interface for manual review and editing
  • Final Processing: AI-powered final polish for publication quality
  • Export System: Multiple format support (TXT, MD, JSON)

πŸ€– AI-Powered Intelligence

  • Multi-Agent System: Specialized AI agents (Writer, Reviewer, Editor)
  • Reinforcement Learning: Smart content ranking and quality assessment
  • Quality Scoring: Automated content quality metrics
  • Context Awareness: Maintains context throughout the processing pipeline

πŸ” Advanced Search & Analytics

  • Vector Database: ChromaDB integration for semantic search
  • Version Control: Complete content version tracking
  • Performance Monitoring: Real-time workflow statistics
  • Quality Feedback: Machine learning-based content improvement

🎨 User Experience

  • Intuitive Interface: Clean, tab-based Gradio interface
  • Real-time Updates: Live status tracking across all stages
  • Batch Processing: Process multiple URLs simultaneously
  • Visual Feedback: Screenshots and progress indicators

πŸš€ Quick Start

Prerequisites

  • Python 3.8 or higher
  • Google Gemini API key
  • Chrome/Chromium browser (for web scraping)

Installation

  1. Clone the repository
git clone https://github.com/sohan2311/automated-book-publication-workflow.git
cd automated-book-publication-workflow
  1. Install dependencies
pip install -r requirements.txt
  1. Set up environment variables
cp .env.template .env
# Edit .env and add your Gemini API key
GEMINI_API_KEY=your_gemini_api_key_here
  1. Initialize the database
python init_db.py
  1. Run the application
python main_file.py
  1. Access the interface Open your browser and navigate to http://localhost:7860

πŸ“– Usage Guide

1. Content Scraping

  • Enter any web URL in the scraping tab
  • Click "Extract Content" to scrape text and take screenshots
  • Review extracted content and send to AI processing

2. AI Processing

  • Configure processing iterations (1-5)
  • AI agents automatically enhance content for clarity and engagement
  • Multiple review cycles ensure high-quality output

3. Human Editing

  • Manual review and editing interface
  • Add editor notes and comments
  • Real-time content editing with version tracking

4. Final Processing

  • AI-powered final polish
  • Publication-ready formatting
  • Export in multiple formats

5. Search & Analytics

  • Search through all content versions
  • Filter by processing status
  • Provide quality feedback for continuous improvement

πŸ—οΈ Architecture

Core Components

πŸ“¦ Project Structure
β”œβ”€β”€ 🌐 Web Scraping (Playwright)
β”œβ”€β”€ πŸ€– AI Processing (Google Gemini)
β”œβ”€β”€ πŸ” Search Engine (ChromaDB)
β”œβ”€β”€ πŸ“Š Analytics (Custom RL Algorithm)
β”œβ”€β”€ πŸ–₯️ User Interface (Gradio)
└── πŸ’Ύ Data Management (JSON + Vector DB)

AI Agent System

  • Writer Agent: Content enhancement and rewriting
  • Reviewer Agent: Quality assessment and improvement suggestions
  • Editor Agent: Final polish and formatting

Technology Stack

  • Backend: Python, AsyncIO
  • AI/ML: Google Gemini API, ChromaDB, Sentence Transformers
  • Web Scraping: Playwright, BeautifulSoup
  • Frontend: Gradio
  • Database: ChromaDB (Vector), JSON (Metadata)
  • Processing: NumPy, scikit-learn

πŸ“Š Performance Metrics

The system tracks comprehensive performance metrics:

  • Processing Speed: Average processing time per stage
  • Quality Scores: Automated content quality assessment
  • Success Rates: Operation success tracking
  • Version Analytics: Content version evolution tracking

πŸ”§ Configuration

Environment Variables

GEMINI_API_KEY=your_api_key_here
CHROMA_DB_PATH=./data/chroma_db
SCREENSHOTS_PATH=./screenshots
EXPORTS_PATH=./exports

Customization Options

  • AI processing iterations
  • Quality scoring algorithms
  • Search result rankings
  • Export format templates

πŸ“ Project Structure

automated-book-publication-workflow/
β”œβ”€β”€ .gradio/                    # Gradio cache files
β”œβ”€β”€ content/                    # Processed content storage
β”œβ”€β”€ data/                       # Database and persistent data
β”œβ”€β”€ exports/                    # Exported final content
β”œβ”€β”€ logs/                       # Application logs
β”œβ”€β”€ models/                     # AI model cache
β”œβ”€β”€ screenshots/                # Web scraping screenshots
β”œβ”€β”€ .env                        # Environment variables
β”œβ”€β”€ .env.template              # Environment template
β”œβ”€β”€ .gitignore                 # Git ignore rules
β”œβ”€β”€ config.yaml                # Configuration file
β”œβ”€β”€ human_edit.txt             # Human editing templates
β”œβ”€β”€ init_db.py                 # Database initialization
β”œβ”€β”€ install_dependencies.py    # Dependency installer
β”œβ”€β”€ main_file.py              # Main application file
β”œβ”€β”€ requirements.txt           # Python dependencies
β”œβ”€β”€ sample_workflow.py         # Usage examples
β”œβ”€β”€ setup.py                   # Package setup
└── README.md                  # This file

πŸ”„ Workflow Process

graph TD
    A[Web URL Input] --> B[Content Scraping]
    B --> C[AI Writing Agent]
    C --> D[AI Review Agent]
    D --> E[Human Editing]
    E --> F[Final AI Polish]
    F --> G[Export & Publish]
    
    H[Search & Analytics] --> I[Quality Feedback]
    I --> J[RL Algorithm Update]
    J --> C
Loading

🀝 Contributing

We welcome contributions! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Setup

# Install development dependencies
pip install -r requirements-dev.txt

# Run tests
python -m pytest tests/

# Format code
black main_file.py

# Type checking
mypy main_file.py

Acknowledgments

  • Google Gemini AI for powerful language processing
  • Gradio for the amazing interface framework
  • ChromaDB for vector database capabilities
  • Playwright for reliable web scraping
  • Open Source Community for the incredible tools and libraries

πŸ“ž Contact & Support

πŸ‘¨β€πŸ’» Developer

Sohan Maity

Issues & Feature Requests

🌟 Show Your Support

If this project helped you, please consider:

  • ⭐ Starring the repository
  • πŸ› Reporting bugs or suggesting features
  • 🀝 Contributing to the codebase
  • πŸ“’ Sharing with others who might benefit

Built using Python, AI, and Open Source Technologies

Made with Python Powered by AI

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages