The text of the treebank was transcribed with Wisper (trained on Cretan) from 9 tapes containing folklore narratives by one speaker, Ioannis Anagnostakis, who is responsible for their composition. The narratives are radio broadcasts in digital format, with permission from the Audiovisual Department of the Vikelaia Municipal Library of Heraklion, Crete (1998-2001). The data were split into training (70%), dev (10%) and test (20%) sets.
This is the first treebank for the living (but under resourced) dialect of East Crete. The dialect diverges from Standard Modern Greek at all levels. The treebank is annotated for euphonics and voicing; these phonological phenomena affect the orthography of the dialect. Active annotation was used for knowledge transfer from GUD, a UD treebank of Standard Modern Greek, and the results have been edited manually by a native speaker.
We thank Yannis Kazos for his contribution.
Socrates Vakirtzian, Vivian Stamou, Yannis Kazos, Stella Markantonatou. 2024. Dialectal treebanks and their relation with the standard variety: The case of East Cretan and Standard Modern Greek. The Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025), Tallinn (Estonia), March 2–5, 2025.
- 2024-05-15 v2.14
- Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================ Data available since: UD v2.16 License: CC BY-SA 4.0 Includes text: yes Genre: fiction Lemmas: manual native UPOS: manual native XPOS: manual native Features: manual native Relations: manual native Contributors: Vakirtzian, Socrates; Markantonatou, Stella; Stamou, Vivian Contributing: here Contact: stiliani.markantonatou@gmail.com ===============================================================================