MCCD: A Multi-Attribute Chinese Calligraphy Character Dataset Annotated with Script Styles, Dynasties, and Calligraphers
- We introduce Multi-Attribute Chinese Calligraphy Character Dataset (MCCD), an isolated Chinese character dataset with rich annotations including character, script style, dynasty, and calligrapher.
- Extensive Multi-Attribute Collection: MCCD dataset presents a meticulously curated collection of nearly 330,000 calligraphic character images, ensuring a comprehensive diversity of annotation categories for all characters and their attributes (style, dynasty, and calligrapher).
- Multi-Attribute Subset Construction: MCCD contains labels for 7,765 categories of characters, in addition to which three additional subsets are extracted from the dataset according to the attribute annotations for each character, including 10 styles of calligraphy, 15 major historical dynasties and 142 famous calligraphers, with the aim of optimizing task-specific utilization of the attribute information.
- Benchmark Establishment: We established benchmark performance metrics for single-task recognition (character and each attribute independently) and multi-task recognition (character combined with other attributes simultaneously) experiments using MCCD and all its subsets.
✅ Status: Released
✅ Dataset link: Baiduyun:8x7d / OneDrive
✅ Data format: PNG / lmdb
- Clone this repo:
git clone https://github.com/SCUT-DLVCLab/MCCD.git
- The data_loader folder contains read files for single-attribute labeled lmdb as well as 2-attribute labeled and 4-attribute labeled lmdb data.
Read File | Corresponding Dataset |
---|---|
lmdb_dataset.py | MCCD-Character/ Style/ Dynasty/Calligrapher |
2task_MTL_lmdb_dataset.py | dual_task |
4task_MTL_lmdb_dataset.py | four_task |
❗Note:
- The MCCD dataset can only be used for non-commercial research purposes. For scholar or organization who wants to use the MCCD dataset, please first fill in this Application Form and sign the Legal Commitment and email them to us (eelwjin@scut.edu.cn, cc: yixin_zhao01@126.com). When submitting the application form to us, please list or attached 1-2 of your publications in the recent 6 years to indicate that you (or your team) do research in the related research fields of OCR, handwriting verification, handwriting analysis and recognition, document image processing, and so on.
- We will give you the decompression password after your application has been received and approved.
- All users must follow all use conditions; otherwise, the authorization will be revoked.
MCCD should be used and distributed under Creative Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) License for non-commercial research purposes.
- This repository can only be used for non-commercial research purposes.
- Copyright 2025, Deep Learning and Vision Computing Lab (DLVC-Lab), South China University of Technology.