MultiLexNorm 2021 competition system from ÚFAL
-
Updated
Dec 30, 2021 - Python
MultiLexNorm 2021 competition system from ÚFAL
This repository contains a number of experiments with Multi Lingual Transformer models (Multi-Lingual BERT, DistilBERT, XLM-RoBERTa, mT5 and ByT5) focussed on the Dutch language.
Automated pipeline for expanding medieval Latin abbreviations encoded in TEI using finetuned ByT5. Drop your TEI files, run five scripts, and get a Hugging Face dataset plus a lightweight LoRA adapter for ByT5 that turns graphemic ATR output into expanded text.
Add a description, image, and links to the byt5 topic page so that developers can more easily learn about it.
To associate your repository with the byt5 topic, visit your repo's landing page and select "manage topics."