PAP: A Dataset for Physical and Abstract Plausibility and Sources of Human Disagreement

This folder contains the dataset presented in the paper "A Dataset for Physical and Abstract Plausibility and Sources of Human Disagreement" at LAW-XVII 2023 at ACL 2023.

We present two dataset versions with dataset.tsv including label aggregations based on annotation post-processing as described in §4.2. Specifically, we provide (a) strict majority voting including a disagreement label ('unsure') for both multi-class and binary setups for 1,733 events. Probabilistic aggregations using MACE (Hovy et al., 2013) include aggregated silver labels based on raw annotations without post-processing, i.e., labels for 2,160 events.

Raw Annotation and Aggregated Silver Labels

The file dataset.tsv: contains the following information:

event: the event target as presented to human annotators
original_label: the original label based on dataset construction as described in §3 in the paper; events can either be ('plausible') or implausible ('implausible')
abstractness_combination: the abstractness combination an event belongs to
rating: raw, i.e., non-aggregated but post-processed plausibility ratings as provided by the annotators
majority_multiclass: strict majority votings (>=70%) for a multi-class setup on a scale from 1 to 5 with 1 denoting implausibility, 5 referring to plausibility, and ratings of 3 used for filtering out invalid submissions; in case of no clear majority, a label denoting disagreement ('unsure') is assigned to reflect conflicting perspectives of annotators
distribution_multiclass: class distributions for a multi-class setup on a scale from 1 to 5 with 1 denoting implausibility, 5 referring to plausibility, and ratings of 3 used for filtering out invalid submissions
majority_binary: strict majority votings (70% and more) for a binary setup where 0 denotes implausibility and 1 denotes plausibility; in case of no clear majority, a label denoting disagreement ('unsure') is assigned to reflect conflicting perspectives of annotators
distribution_binary: class distributions for a binary setup where 0 denotes implausibility and 1 denotes plausibility

MACE Silver Aggregations

The folder mace_aggregations: contains the following files and information.

The file aggregated_predictions.tsv: encompasses:

event: the event target as presented to human annotators
original_label: the original label based on dataset construction as described in §3 in the paper; events can be either plausible ('plausible') or implausible ('implausible')
multi-class_prediction: predicted multi-class labels using MACE (Hovy et al., 2013) using standard parameters; labels range from 1 to 5 with 1 denoting implausibility, 5 referring to plausibility, and ratings of 3 used for filtering out invalid submissions
binary_prediction: predicted binary labels using the MACE implementation by Hovy et al., 2013 using standard parameters; 0 denotes implausibility and 1 denotes plausibility

4- and 2-class-dist.predictions:

contain distributions for a multi-class and binary setup Distributions are provided per label with each row referring to distributions for one event. Events are provided in the corresponding order in aggregated_predictions.csv. An example is shown below with each column containing a predicted label and the corresponding distribution separated with a whitespace.

5.0 0.935000961509369 2.0 0.023182269455259166 4.0 0.02100019482697004 1.0 0.02081657420840185

Amendment

Amendment.pdf: We conduct a small case study on a PAP sample to compare original Amazon Mechanical Turk (AMT) annotations with non-AMT annotations. We collect off-AMT annotations for a small subset (~100 instances) focusing on originally pseudo-implausible events which are rated plausible by AMT annotators. In the amendment, we outline the task setup and annotation process, present differences and commonalities between both annotation settings and results and discuss our findings on both annotation and meta-finding levels. We conclude that each of the annotation setups has (dis)advantages, which, however, do not strongly influence meta-level findings. Finally, we discuss aggregation recommendations.

License

To construct this dataset, the creator Annerose Eichel used text data from the English Wikipedia. Wikipedia content is licensed under the Attribution-ShareAlike 4.0 International and, unless stated otherwise, the GNU Free Documentation Licenses. For more details on Wikipedia Licensing, we refer to Wikipedia's Information on Copyrights.

Citation

tbd

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
mace_aggregations		mace_aggregations
Amendment.pdf		Amendment.pdf
README.md		README.md
dataset.tsv		dataset.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PAP: A Dataset for Physical and Abstract Plausibility and Sources of Human Disagreement

Raw Annotation and Aggregated Silver Labels

MACE Silver Aggregations

Amendment

License

Citation

About

Uh oh!

Releases

Packages

AnneroseEichel/PAP

Folders and files

Latest commit

History

Repository files navigation

PAP: A Dataset for Physical and Abstract Plausibility and Sources of Human Disagreement

Raw Annotation and Aggregated Silver Labels

MACE Silver Aggregations

Amendment

License

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages