DocModel: A Document Understanding Model

DocModel is a state-of-the-art document understanding model designed to extract both textual content and 2D spatial relationships from complex documents. Built on the RoBERTa architecture, DocModel is fine-tuned for tasks such as form understanding, entity extraction, and layout-based document analysis.

Key Features

2D Spatial Modeling: Captures text layout and spatial relationships within documents, ideal for complex document structures such as forms, tables, and scans. RoBERTa-based Architecture: Built on a robust architecture for token-level tasks with powerful self-attention mechanisms. Fine-tuned for Document Understanding: Specifically trained on datasets like FUNSD to handle noisy and complex document layouts.

Model Performance

DocModel has been evaluated on the FUNSD dataset, achieving competitive results in extracting meaningful information from challenging, real-world documents.

Evaluation Loss: 1.36752

F1-Score: 0.84126

Installation

To install and use DocModel, follow these steps:

Clone the repository:

git clone https://github.com/tobiadefami/docmodel.git
cd docmodel

Install the package using setup.py:

python setup.py install

You can also install the model dependencies using pip:

pip install -r requirements.txt

Applications

DocModel can be used for a variety of document understanding tasks, including:

Form understanding: Extracting key-value pairs from structured forms.
Entity extraction: Identifying important information from documents with diverse layouts.
Layout-based analysis: Handling complex layouts involving tables, scanned images, and multi-column formats.

Model Availability

Model Hub: DocModel on Hugging Face Hub

License

This project is licensed under the Mozilla Public License 2.0. See the LICENSE file for details.

Contact

For any questions or inquiries, feel free to reach out!

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
docmodel		docmodel
requirements		requirements
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DocModel: A Document Understanding Model

Key Features

Model Performance

Installation

Applications

Model Availability

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

PragmaticMachineLearning/docmodel

Folders and files

Latest commit

History

Repository files navigation

DocModel: A Document Understanding Model

Key Features

Model Performance

Installation

Applications

Model Availability

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages