Skip to content

DFKI-NLP/subjective_text_complexity_corpus

Repository files navigation

Subjective Text Complexity Corpus for German [Paper]

A corpus consisting of German sentences, annotated with subjective complexity ratings by two target groups.

322 sentences annotated with complexity ratings of (1) experts and (2) non-experts on a 5-point-Likert scale (1-very easy to 5-very complex).

Data comes from DATEV, a German IT service provider in the context of German tax consultants, auditors, and lawyers. The sentences have been extracted from 232 documents regarding instructions, commentaries and descriptions which address employees of the service provider, as well as external users of the system. They often describe technical solutions to the company's products or give more detailed descriptions about law regulations affecting the company's clients.

Citation

If you find the code or dataset patch helpful, please cite the following paper:

@inproceedings{seiffe-etal-2022-subjective,
    title = "Subjective Text Complexity Assessment for {G}erman",
    author = {Seiffe, Laura  and
      Kallel, Fares  and
      M{\"o}ller, Sebastian  and
      Naderi, Babak  and
      Roller, Roland},
    editor = "Calzolari, Nicoletta  and
      B{\'e}chet, Fr{\'e}d{\'e}ric  and
      Blache, Philippe  and
      Choukri, Khalid  and
      Cieri, Christopher  and
      Declerck, Thierry  and
      Goggi, Sara  and
      Isahara, Hitoshi  and
      Maegaard, Bente  and
      Mariani, Joseph  and
      Mazo, H{\'e}l{\`e}ne  and
      Odijk, Jan  and
      Piperidis, Stelios",
    booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",
    month = jun,
    year = "2022",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://aclanthology.org/2022.lrec-1.74/",
    pages = "707--714"
}

License

The code is released under the under terms of the CC-BY-4.0 license.

About

A corpus consisting of German sentences, annotated with subjective complexity ratings by two target groups

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •