Skip to content

The goal is to convert the unstructured project technical documents to structured JSON schema using the MoD-DLM

Notifications You must be signed in to change notification settings

mod-construction/2024-Hackathon---PDFtoDLM

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDFtoDLM: Multi-File PDF to JSON Extraction Tool

AEC Hackathon Munich : MOD Smart Prefab challenge, Team MOD-2.

Our goal is to extract structured data from unstructured PDFs containing information about prefabricated elements. The main challenge is to produce reliable results in JSON format, which can be used for further applications.

The strategy we chose involves using two models acting as agents, based on OpenAI and Claude respectively, to improve each other's quality. One model generates the initial JSON based on the prompt request, while the other checks it and corrects any mistakes in the previous output.

There are some typical approaches for this challenge. Prompt engineering refers to designing effective prompts to instruct the LLM to complete the task. RAG applies semantic embedding to retrieve the relevant chunks of documents and the LLM then uses this retrieved content to generate answers that are more informed and accurate. Fine tuning aims to customize an LLM on a specific dataset to adjust its behavior or optimize it for specific tasks. In this project, we did not do fine tuning, instead, we tried a strategy called Verbal Reinforcement Learning, that is using feedback from human evaluators to iteratively improve how the LLM responds.

PDFtoDLM is composed of two backends for LLMs, a frontend user interface, and some additional tools.

Features

  • Upload multiple PDF files via the web interface.
  • Immediate visualization of each uploaded PDF.
  • Asynchronous generation of structured JSON data from each PDF.
  • Interactive JSON schema editor with live syntax highlighting.
  • Options to save edited JSON and download it locally.

Tech Stack

OpenAI

  • Frontend: React.js
  • Backend: Node.js with Express
  • PDF Parsing: pdf-parse, pdf-lib
  • AI Integration: OpenAI API

Claude

Installation

OpenAI backend

  • Node.js (v14 or higher)
  • npm or yarn
  • OpenAI API Key (requires a valid API key from OpenAI)

Claude backend

Team

About

The goal is to convert the unstructured project technical documents to structured JSON schema using the MoD-DLM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 57.7%
  • Python 29.3%
  • Shell 4.6%
  • HTML 3.7%
  • CSS 2.7%
  • Makefile 2.0%