GitHub - mod-construction/AECHachathon_MOD_ParseThat

Python Service

README.md

# Python Service: PDF Text and Table Extraction  

This repository contains a Python service that processes PDF files to extract text and tables, then converts the extracted content into structured data using OpenAI's models.  

## Features  
- Extracts text and tables from PDF files using `pdfplumber` or a similar library.  
- Utilizes OpenAI APIs for prompting and converting extracted content into structured data.  
- Receives PDF file paths from a web application and returns structured data in JSON format.  

## Technologies Used  
- Python 3.11.10  
- Libraries: `marker`, OpenAI API, `Flask`  

## Installation  

### Prerequisites  
- Python 3.11.10 or later  
- OpenAI API key  

### Steps 

1. **Clone the Repository**
   Clone this repository to your local machine:
   ```bash
   git clone <repository-url>
   cd python-service

Create a Virtual Environment Set up a Python virtual environment to isolate dependencies:
- On macOS/Linux:
```
python3 -m venv venv
source venv/bin/activate
```
- On Windows:
```
python -m venv venv
venv\Scripts\activate
```
Install Dependencies Install the required Python packages listed in the requirements.txt file:
```
pip install -r requirements.txt
```
Set Up API Key Update the secret.py file in the root of the project to store your OpenAI API key:
```
echo "OPENAI_API_KEY=your-openai-api-key" > 
```
Replace your-openai-api-key with your actual API key from OpenAI.
Run the Service Start the Flask server to make the service accessible:
```
python run.py
```
The service will be available at http://localhost:5000.

Usage

API Endpoint

GET `/api/parseit`

Description: Processes a PDF file, extracts its text and tables, and returns structured data in JSON format.

Request Body:
```
{ "input": "path/to/pdf/file.pdf" }
```

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
ParseThat.py		ParseThat.py
ParseThatLocal.py		ParseThatLocal.py
ParseThat_by_llm.py		ParseThat_by_llm.py
ParseThat_by_llm_local.py		ParseThat_by_llm_local.py
ParseThat_pdf.py		ParseThat_pdf.py
ParseThat_to_dlm.py		ParseThat_to_dlm.py
README.md		README.md
README_LOCAL.md		README_LOCAL.md
markup.md		markup.md
mod_dlm_schema.py		mod_dlm_schema.py
openapi.json		openapi.json
our_schema.json		our_schema.json
requirements.txt		requirements.txt
run.py		run.py
run_local.py		run_local.py
secret.py		secret.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Python Service

Usage

API Endpoint

GET `/api/parseit`

License

About

Uh oh!

Releases

Packages

Languages

mod-construction/AECHachathon_MOD_ParseThat

Folders and files

Latest commit

History

Repository files navigation

Python Service

Usage

API Endpoint

GET /api/parseit

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

GET `/api/parseit`

Packages