GerTE (KONVENS 2025)

This is the repo for the code and dataset of the content zone prediction experiment on German source-dependent essays (Textgebundene Erörterung) (GerTE) experiment, to be presented at KONVENS 2025. The accompanying paper, titled Predicting Functional Content Zones in German Source-Dependent Argumentative Essays: Experiments on a Novel Dataset will be published as part of the Proceedings of KONVENS 2025.

About

The dataset consists of 117 short argumentative essays written in response to 3 different news articles. The essays are therefore source-dependent, and each essay deals with the discussion topic of one article. Each essay has been segmented into sentences, and each sentence is labelled with exactly 1 out of 7 possible content zone labels. Details on the data collection and annotation process are provided in the paper.

The sentence-segmented and content-zone labelled essays are provided in data/gerte_full.tsv. It is a tab-separated file with the following header lines:

Item	Description
`essay_id`	ID of the essay
`sent_id`	ID of the sentence within the essay
`sent_text`	Sentence text
`sent_label`	Content zone label for the sentence
`topic_id`	ID of the topic that the essay deals with

The data are also available as a pickled Python object data/full_cz_data.p, which contains the essays as a list object. The Python scripts in this repo use this pickled data file, but the data are the same as in the .tsv file.

The three topics and their IDs are as follows. The respective news articles that served as source text are linked from the discussion topcis:

Topic ID	Discussion Topic
1	Should social media like Twitter/X be integrated into schools for learning?
2	Should school start later in the morning than 8 am?
3	Should climate change be taught in a school subject of its own?

Some Dataset Statistics

Topic Distribution

Topic	Number of Essays
1	50
2	50
3	17

Content zone distribution based on 7 classes

Label	Number of sentence instances
own	551
article_pro	460
info_intro	281
article_con	243
off_topic	77
other	71
meta	30

License

TBD

Reference

TBD

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.idea		.idea
code		code
data		data
resc/SourceArticles		resc/SourceArticles
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GerTE (KONVENS 2025)

About

Some Dataset Statistics

Topic Distribution

Content zone distribution based on 7 classes

License

Reference

About

Uh oh!

Releases

Packages

Languages

discourse-lab/GerTE-Konvens-25

Folders and files

Latest commit

History

Repository files navigation

GerTE (KONVENS 2025)

About

Some Dataset Statistics

Topic Distribution

Content zone distribution based on 7 classes

License

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages