Skip to content

mod-construction/AECHachathon_MOD_ParseThat

 
 

Repository files navigation

Python Service

README.md

# Python Service: PDF Text and Table Extraction  

This repository contains a Python service that processes PDF files to extract text and tables, then converts the extracted content into structured data using OpenAI's models.  

## Features  
- Extracts text and tables from PDF files using `pdfplumber` or a similar library.  
- Utilizes OpenAI APIs for prompting and converting extracted content into structured data.  
- Receives PDF file paths from a web application and returns structured data in JSON format.  

## Technologies Used  
- Python 3.11.10  
- Libraries: `marker`, OpenAI API, `Flask`  

## Installation  

### Prerequisites  
- Python 3.11.10 or later  
- OpenAI API key  

### Steps 

1. **Clone the Repository**
   Clone this repository to your local machine:
   ```bash
   git clone <repository-url>
   cd python-service
  1. Create a Virtual Environment Set up a Python virtual environment to isolate dependencies:

    • On macOS/Linux:

      python3 -m venv venv
      source venv/bin/activate
    • On Windows:

      python -m venv venv
      venv\Scripts\activate
  2. Install Dependencies Install the required Python packages listed in the requirements.txt file:

    pip install -r requirements.txt
  3. Set Up API Key Update the secret.py file in the root of the project to store your OpenAI API key:

    echo "OPENAI_API_KEY=your-openai-api-key" > 

    Replace your-openai-api-key with your actual API key from OpenAI.

  4. Run the Service Start the Flask server to make the service accessible:

    python run.py

    The service will be available at http://localhost:5000.

Usage

API Endpoint

GET /api/parseit

Description: Processes a PDF file, extracts its text and tables, and returns structured data in JSON format.

  • Request Body:
    { "input": "path/to/pdf/file.pdf" }

License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%