AlphaExtract - AI-powered PDF Summarizer

AlphaExtract is a cutting-edge PDF summarization tool that leverages state-of-the-art AI models to extract and synthesize information from PDF documents. Built with Meta's LLaMA 4 MOE Maverick model and powered by Groq's inference engine, it provides blazing-fast, high-precision summaries for any PDF document.

Features

Intelligent PDF Processing: Convert PDFs to images and extract detailed information
Advanced Summarization: Generate comprehensive, well-structured summaries using LLaMA 4 MOE Maverick
Professional PDF Export: Download summaries as beautifully formatted PDF documents
Modern Web Interface: Clean, responsive UI built with Streamlit
Parallel Processing: Multi-threaded extraction for improved performance
Docker Support: Easy deployment with containerization
CI/CD Integration: Automated Docker image builds and pushes

Architecture

AlphaExtract follows a pipeline architecture with three main components:

PDF Processing: Converts PDF documents to images for processing
Detail Extraction: Uses LLaMA 4 MOE Maverick to extract detailed information from each page
Summary Generation: Synthesizes extracted information into a coherent, analytical summary

The pipeline is optimized for parallel processing and handles documents of varying lengths efficiently.

Technical Stack

Language: Python 3.10
Web Framework: Streamlit
AI Models: Meta's LLaMA 4 MOE Maverick
Inference Engine: Groq
PDF Processing: pdf2image, ReportLab
Package Management: uv
Containerization: Docker
CI/CD: GitHub Actions

Requirements

Python 3.10 or higher
Dependencies listed in pyproject.toml
Groq API key for inference
Poppler utils for PDF processing

Installation

Clone the repository:

git clone https://github.com/yourusername/AlphaExtract.git
cd AlphaExtract

Install dependencies using uv:

curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync

Set up environment variables:
```
export GROQ_API_KEY=your_api_key_here
```
Run the application:
```
streamlit run main.py
```

Usage

Access the web interface at http://localhost:7860
Upload your PDF document using the sidebar
Wait for the processing to complete
View the generated summary
Download the summary as a PDF document

Docker Deployment

Build the Docker image:
```
docker build -t alphaextract .
```

Run the container:

docker run -p 7860:7860 -e GROQ_API_KEY=your_api_key_here alphaextract

The application will be available at http://localhost:7860.

Project Structure

AlphaExtract/
├── .github/
│   └── workflows/
│       └── dockerhubPush.yaml
├── src/
│   ├── components/
│   │   ├── extractPdfDetails.py
│   │   └── summaryEngine.py
│   ├── pipelines/
│   │   └── pipeline.py
│   └── utils/
│       ├── functions.py
│       └── logger.py
├── config.ini
├── Dockerfile
├── main.py
├── prompts.yaml
└── pyproject.toml

Key Components

main.py: Streamlit web application entry point
src/components/: Core processing modules
src/pipelines/: Pipeline orchestration
config.ini: Configuration settings
prompts.yaml: LLM system prompts
Dockerfile: Container configuration
.github/workflows/: CI/CD configuration

Screenshots

Project Demo Complete demonstration of PDF upload, processing, and summary generation
Application Interface The clean and intuitive application interface

License

This project is licensed under the MIT License.

Author

Created with ❤️ by Rauhan Ahmed Siddiqui.

For questions or support, please open an issue on the GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
demo		demo
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
config.ini		config.ini
main.py		main.py
prompts.yaml		prompts.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AlphaExtract - AI-powered PDF Summarizer

Features

Table of Contents

Architecture

Technical Stack

Requirements

Installation

Usage

Docker Deployment

Project Structure

Key Components

Screenshots

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

RauhanAhmed/AlphaExtract

Folders and files

Latest commit

History

Repository files navigation

AlphaExtract - AI-powered PDF Summarizer

Features

Table of Contents

Architecture

Technical Stack

Requirements

Installation

Usage

Docker Deployment

Project Structure

Key Components

Screenshots

License

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages