Short description of the record. Ground truth for ÖNB 3891 manuscript + automatically read pages with Transkribus.
ATTENTION: To clone this repo you need to have Git LFS installed and then clone the repository like this:
git lfs clone git@github.com:htr-school-vienna/[your_repository_name].git
Sermones by Thomas Ebendorfer (1388-1464) as found in MS Vienna, Austrian National Library (ÖNB), Cod. 3891. Wolfgang Chranekker, an organist in St. Wolfgang, finished the writing in 1441. See the description of the manuscript at Manuscripta.at. Writing: Latin, Bastarda, mid 15th C. Number of files: Number of lines:
-
Source of images: Austrian National Library
-
Description or citation of transcription guidelines
-
expanded abbrevations
-
preserved original punctuation
-
preserved the original interpunction
-
used "/" for virgula
-
didn´t add "." at the end of sentences
-
used ¬ at the end of the line if a word is divided
-
used "v" for consonant and "u" for vocal
-
used i for i/j
-
used s for ſ/s
-
used c/t as in the manuscript
-
no capitalization of letters
-
preserved "ll" in the place of L, "ff" in the place of "F", etc.
-
separated prepositions from words
-
wrote words together that ought to be written together
-
preserved numbers See Google docs
- How are the data organised in the files (e.g. images in images folder, tei export in tei folder, etc.)?
- If there is a system for naming images and files (and there should be), this is the place to describe it.
- Anything else that might help you understand the structure of the repository
This dataset was created by Cehuľová Viktória, Ciuntu Mara-Elena, Engelmaier Leonhard, Kohn Albert, Lukáč Kováčová Magdaléna, Lukáč Labancová Ivana, Mihaljević Ana, Odstrčilík Jan, Roček Martin, Rokpelne Liene, Scalia Andrea, Šaldová Zuzana, Vašíček Andrej, Yücel Fatih, Zelenková Adéla. The digitisation is not copyright free, but the transcription is. However, properly annotating a corpus takes time and is a task that should be recognised. If you use any item from this corpus as ground truth, cite the dataset using the following information
Copy citation BibTeX from Zenodo
This dataset was created as part of the Winter School of Handwritten Text Recognition of Medieval Manuscripts 2023/2024, Vienna at the Österreichische Akademie der Wissenschaften, Institut für Mittelalterforschung, all transcriptions are licensed under the Creative Commons 4 licence. Images were provided by the Austrian National Library (ÖNB).