OpenAI Document Analyzer

A powerful Python application for analyzing text and PDF files using OpenAI's latest chat completion models. This tool allows you to ask questions about documents using customizable prompts and examples.

Features

📄 Support for both text (.txt) and PDF (.pdf) files
🤖 Compatible with all OpenAI chat models (GPT-4o, GPT-4o-mini, GPT-4 Turbo, GPT-3.5 Turbo)
🔄 Dynamic model switching during runtime
📝 Customizable prompts and examples for context
🛡️ Robust error handling and input validation
🎯 PEP8 compliant code with type hints

Requirements

Python 3.8+
OpenAI API key

Installation

Quick Setup

Clone the repository:

git clone https://github.com/ddtdanilo/OpenAI-Document-Analyzer.git
cd OpenAI-Document-Analyzer

Run the setup script:

python3 setup.py

This will:

Create a virtual environment (optional)
Install all dependencies
Create a .env file from the template

Add your OpenAI API key to the .env file:

OPENAI_API_KEY=your_api_key_here

Manual Setup

Create a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Configure environment:

cp env.example .env
# Edit .env and add your OpenAI API key

Usage

Basic Usage

python scripts/text_analysis.py examples/example_prompt.txt examples/example_response.txt examples/example_text_to_analyze.txt

Using Different Models

You can specify the default model in your .env file:

OPENAI_MODEL=gpt-4o-mini  # or gpt-4-turbo, gpt-3.5-turbo, etc.

Or switch models interactively during runtime by typing model at the prompt.

Available Commands

Type your question to analyze the text
Type model to change the AI model
Type exit to quit

Available Models

Model	Description
gpt-4o	Latest and most capable model (default)
gpt-4o-mini	Smaller, faster, and more affordable GPT-4o
gpt-4-turbo	High performance model with vision capabilities
gpt-4-turbo-preview	Preview version of GPT-4 Turbo
gpt-3.5-turbo	Fast and efficient for most tasks
gpt-3.5-turbo-16k	Extended context window version

Project Structure

OpenAI-Document-Analyzer/
├── scripts/
│   ├── __init__.py             # Package initialization
│   ├── text_analysis.py       # Main application script
│   └── document_analyzer.py   # DocumentAnalyzer class
├── tests/
│   ├── __init__.py             # Tests package initialization
│   ├── conftest.py             # Pytest configuration and fixtures
│   ├── test_document_analyzer.py  # Unit tests
│   └── test_data/              # Test data directory
├── examples/
│   ├── example_prompt.txt      # Sample prompt
│   ├── example_response.txt    # Sample response
│   └── example_text_to_analyze.txt  # Sample text to analyze
├── .github/
│   └── workflows/
│       ├── tests.yml           # Main CI/CD pipeline
│       ├── coverage-badge.yml  # Auto-generated coverage badge
│       └── release.yml         # Automated releases
├── requirements.txt            # Python dependencies
├── package.json               # Semantic release configuration
├── setup.py                   # Setup script
├── test_setup.py              # Installation verification script
├── run_tests.py               # Test runner script
├── env.example                # Environment template
├── CHANGELOG.md               # Auto-generated changelog
└── README.md                  # This file

Dependencies

openai>=1.0.0 - OpenAI Python client library
pypdf>=3.17.0 - PDF processing library
python-dotenv>=1.0.0 - Environment variable management

API Reference

`load_text(filepath: str) -> str`

Load text content from a file (supports .txt and .pdf formats).

`ask_questions(prompt: str, example_prompt: str, example_response: str, text_to_analyze: str, model: Optional[str] = None) -> str`

Generate AI responses based on the provided context and prompt.

Error Handling

The application includes comprehensive error handling for:

Missing or invalid files
Unsupported file formats
API errors and rate limits
Invalid model selections

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Development Guidelines

Follow PEP8 style guide
Add type hints to all functions
Include docstrings for all modules and functions
Write tests for new features
Update documentation as needed

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

OpenAI for providing the powerful language models
Contributors and users of this project

Support

For issues, questions, or contributions, please visit the GitHub repository.

Running Tests

The project includes a comprehensive test suite using pytest. Multiple options available:

Quick Test Run

If you have virtual environment activated:

./venv/bin/python3 run_tests.py

Or using pytest directly:

pytest tests/ -v

Manual Test Commands

Install test dependencies:

pip install -r requirements.txt

Run basic tests:

pytest tests/

Run tests with detailed output:

pytest tests/ -v

Run tests with coverage report:

pytest tests/ --cov=scripts/ --cov-report=term-missing

Run specific test file:

pytest tests/test_document_analyzer.py -v

Continuous Integration

🚀 Automated Workflows:

✅ Tests - Run automatically on push/PR across Python 3.8, 3.9, 3.10
📊 Coverage - Auto-generated badge + Coveralls integration
🏷️ Releases - Semantic versioning with automatic changelog generation

Coverage Tracking:

🏷️ Auto-generated badge - Updates automatically on each push
📊 Coveralls integration - Detailed coverage reports and trends

Test Coverage

Current test coverage includes:

DocumentAnalyzer class initialization
Text analysis functionality (mocked)
File loading (both .txt and .pdf)
Error handling for invalid files
PDF text extraction (mocked)
Complete document analysis workflow

Releases

This project uses semantic versioning with automated releases:

Commit Convention

Use conventional commits for automatic version bumping:

feat: add new document analysis feature    # → Minor version bump (1.1.0)
fix: resolve PDF parsing issue            # → Patch version bump (1.0.1)  
docs: update README                       # → No version bump
chore: update dependencies                # → No version bump

BREAKING CHANGE: remove deprecated API    # → Major version bump (2.0.0)

Automatic Release Process

Push to main → Triggers release workflow
Analyze commits → Determines version bump type
Generate changelog → Based on commit messages
Create release → Automatic GitHub release with notes
Update badges → Coverage and release status

Release Outputs

📋 CHANGELOG.md - Automatically generated and maintained
🏷️ Git tags - Semantic version tags (v1.0.0, v1.1.0, etc.)
📦 GitHub Releases - With auto-generated release notes

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
examples		examples
scripts		scripts
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
coverage-badge.svg		coverage-badge.svg
env.example		env.example
package.json		package.json
requirements.txt		requirements.txt
run_tests.py		run_tests.py
setup.py		setup.py
test_setup.py		test_setup.py

License

ddtdanilo/OpenAI-Document-Analyzer

Folders and files

Latest commit

History

Repository files navigation

OpenAI Document Analyzer

Features

Requirements

Installation

Quick Setup

Manual Setup

Usage

Basic Usage

Using Different Models

Available Commands

Available Models

Project Structure

Dependencies

API Reference

load_text(filepath: str) -> str

ask_questions(prompt: str, example_prompt: str, example_response: str, text_to_analyze: str, model: Optional[str] = None) -> str

Error Handling

Contributing

Development Guidelines

License

Acknowledgments

Support

Running Tests

Quick Test Run

Manual Test Commands

Continuous Integration

Test Coverage

Releases

Commit Convention

Automatic Release Process

Release Outputs

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

`load_text(filepath: str) -> str`

`ask_questions(prompt: str, example_prompt: str, example_response: str, text_to_analyze: str, model: Optional[str] = None) -> str`

Packages