UD_Khoekhoe-KDT is a Universal Dependencies (UD) treebank for the Khoekhoe language. The annotation was performed manually based on glosses. This treebank includes texts from various sources: fiction, grammar, and spoken conversation. The treebank contains 27k tokens, distributed as follows:
- Training set: 15k tokens
- Development set: 2k tokens
- Test set: 10k tokens
The UD_Khoekhoe-KDT treebank consists of various texts translated into Khoekhoe by native speakers, then glossed and annotated. The included texts are:
- grammar_Cairo: 20 examples from the Cairo Cicling Corpus)
- grammar_BivalTyp: BivalTyp dataset.
- film_Bridge: Subtitles from Bridge of Spies (2015).
- film_Titanic: A section of subtitles from Titanic (1997).
- book_Khomai: Chapters from the school book Khomai (1971), with shuffled sentences.
- conversation_Windhoek5: A recorded conversation between two friends, anonymized by shuffling sentences.
- Fiction: Sentence IDs start with film/book.
- Grammar: Sentence IDs start with grammar.
- Spoken: Sentence IDs start with conversation.
- Training set: Full grammar_Cairo, sections of grammar_BivalTyp, film_Bridge, book_Khomai, conversation_Windhoek5
- Development set: Sections of grammar_BivalTyp, film_Bridge, conversation_Windhoek5
- Test set: Full film_Titanic, sections of grammar_BivalTyp, book_Khomai, conversation_Windhoek5
Roswitha Gases, Mauricius Gariseb, Mildred Howoses, Petronella Nauises, Dorothea Tases, Zarina Molochieva, Meidad Ovadia, Adam Rapoport, Naama Tiroshi, Anna Veselovsky, Margaret Zellers
- Witzlack-Makarevich, Alena and Sylvanus Job. 2024. Bivalent patterns in Khoekhoe. In: Say, Sergey (ed.). BivalTyp: Typological database of bivalent verbs and their encoding frames. (Data first published on January 23, 2024; last revised on December 10, 2024.) (Available online at https://www.bivaltyp.info, Accessed on 18 February 2025.)
- 2025-05-15 v2.16
- Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================ Data available since: UD v2.16 License: CC BY-SA 4.0 Includes text: yes Genre: fiction grammar-examples spoken Lemmas: manual native UPOS: manual native XPOS: manual native Features: manual native Relations: manual native Contributors: Tulchynska, Kira; Witzlack-Makarevich, Alena; Job, Sylvanus; Hahn, Michael Contributing: here Contact: kira.tulchynska@mail.huji.ac.il, witzlack@gmail.com ===============================================================================