Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration

🌟 Highlights

AutoHDR
FPHDR dataset
We propose a novel fully Automated solution for HDR (AutoHDR), inspired by mirroring the workflow of expert historians.
We introduce a pioneer Full-Page HDR dataset (FPHDR), which supports comprehensive HDR model training and evaluation.
Extensive experiments demonstrate the superior performance of our method on both text and appearance restoration.
The modular design enables flexible adjustments, allowing AutoHDR to collaborate effectively with historians.

📅 News

2025.07.21: 📢 Released the FPHDR dataset!
2025.07.17: 🚀 The pretrained model has been released!
2025.07.13: 🔥🎉 The 💻 demo is now live! Welcome to try it out!
2025.07.09: Release the inference code.
2025.07.08: Our paper is now available on arXiv.
2025.05.15: 🎉🎉 Our paper is accepted by ACL2025 main.

🚧 TODO List

🔥 Model Zoo

Model	Checkpoint	Status
AutoHDR-Qwen2-1.5B	BaiduYun:W2wq	Released
AutoHDR-Qwen2-7B	BaiduYun:6o84	Released
DiffHDR	BaiduYun:63a3	Released
Damage Localization Model	BaiduYun:2QC7	Released
OCR Model	BaiduYun:1X88	Released

🔥 FPHDR Dataset

Dataset	Link	status
Real data	BaiduYun:983A	Released
Synthetic data	-	Coming soon

Note:

The FPHDR dataset can only be used for non-commercial research purposes. For scholar or organization who wants to use the FPHDR dataset, please first fill in this Application Form and sign the Legal Commitment and email them to us (eelwjin@scut.edu.cn, cc: yuyi.zhang11@foxmail.com). When submitting the application form to us, please list or attached 1-2 of your publications in the recent 6 years to indicate that you (or your team) do research in the related research fields of OCR, historical document analysis and restoration, document image processing, and so on.
We will give you the decompression password after your application has been received and approved.
All users must follow all use conditions; otherwise, the authorization will be revoked.

Dataset File Structure

images/
  ├── FS_2_2_1.jpg
  ├── FS_2_9_1.jpg
  ├── ...
labels/
  ├── FS_2_2_1.json
  ├── FS_2_9_1.json
  ├── ...

Label Annotation Format

{
  "columns": [
    {
      "x": ...,
      "y": ...,
      "w": ...,
      "h": ...,
      "column_id": "...",
      "idx": ...
    },
    ...
  ],
  "chars": [
    {
      "x": ...,
      "y": ...,
      "w": ...,
      "h": ...,
      "txt": "...",
      "cid": ...,
      "char_id": "...",
      "idx": ...,
      "grade": "light|medium|severe|null"
    },
    ...
  ]
}

columns: Column bounding boxes (x, y, w, h)
chars: Character annotations (txt, x, y, w, h, grade)
grade: Damage level (light, medium, severe, or empty for no damage)

🚧 Installation

Prerequisites

Ubuntu 20.04 (required)
Linux
Python 3.10
Pytorch 2.3.0
CUDA 11.8

Environment Setup

Clone this repo:

git clone https://github.com/SCUT-DLVCLab/AutoHDR.git

Step 0: Download and install Miniconda from the official website.

Step 1: Create a conda environment and activate it.

conda create -n autohdr python=3.10 -y
conda activate autohdr

Step 2: Install the required packages.

pip install -r requirements.txt

📺 Inference

Step 0: Download all model files (except the OCR model) from the Model Zoo and put them in the ckpt folder.

Step 1: Download the OCR model files from the Model Zoo, unzip the package, and move the extracted files into the dist folder.

Step 2: Using AutoHDR for damaged historical documents Restoration:

CUDA_VISIBLE_DEVICES=<gpu_id> python infer_pipeline.py

🚀 RUN WebUI

We provide two convenient ways to run the WebUI demo:

(1) Visit our deployed online demo directly: demo

(2) Run the demo locally:

CUDA_VISIBLE_DEVICES=<gpu_id> python demo_gradio.py

example:

☎️ Contact

If you have any questions, feel free to contact Yuyi Zhang at yuyi.zhang11@foxmail.com

🌄 Gallery

💙 Acknowledgement

📜 License

The code and dataset should be used and distributed under (CC BY-NC-ND 4.0) for non-commercial research purposes.

⛔️ Copyright

This repository can only be used for non-commercial research purposes.
For commercial use, please contact Prof. Lianwen Jin (eelwjin@scut.edu.cn).
Copyright 2025, Deep Learning and Vision Computing Lab (DLVC-Lab), South China University of Technology.

✒️Citation

If you find AutoHDR helpful, please consider giving this repo a ⭐ and citing:

@article{Zhang2025autohdr,
      title={Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration}, 
      author={Yuyi Zhang and Peirong Zhang and Zhenhua Yang and Pengyu Yan and Yongxin Shi and Pengwei Liu and Fengjun Guo and Lianwen Jin},
      journal={Proceedings of the 63nd Annual Meeting of the Association for Computational Linguistics},
      year={2025},
}

Thanks for your support!

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
application-form		application-form
ckpt		ckpt
document		document
images		images
mmdet		mmdet
models		models
results		results
src		src
tmp_img		tmp_img
utils		utils
.gitignore		.gitignore
README.md		README.md
api_test.png		api_test.png
demo_gradio.py		demo_gradio.py
example.jpg		example.jpg
infer_pipeline.py		infer_pipeline.py
infer_pipeline_api.py		infer_pipeline_api.py
requirements.txt		requirements.txt
utils_pipeline.py		utils_pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration

🌟 Highlights

📅 News

🚧 TODO List

🔥 Model Zoo

🔥 FPHDR Dataset

🚧 Installation

Prerequisites

Environment Setup

📺 Inference

🚀 RUN WebUI

☎️ Contact

🌄 Gallery

💙 Acknowledgement

📜 License

⛔️ Copyright

✒️Citation

⭐ Star Rising

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

SCUT-DLVCLab/AutoHDR

Folders and files

Latest commit

History

Repository files navigation

Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration

🌟 Highlights

📅 News

🚧 TODO List

🔥 Model Zoo

🔥 FPHDR Dataset

🚧 Installation

Prerequisites

Environment Setup

📺 Inference

🚀 RUN WebUI

☎️ Contact

🌄 Gallery

💙 Acknowledgement

📜 License

⛔️ Copyright

✒️Citation

⭐ Star Rising

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages