-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Hi there,
I'm reaching out to ask a few questions about the SciDTB dataset that I couldn't find answers to in this repository or the paper:
Firstly, the dev and test directories are clear in distinguishing the gold annotations and the second annotations while the train directory seems to have everything lumped together since you can see file names like P14-1024_anno1.edu.txt.dep,
P14-1024_anno2.edu.txt.dep,
and P14-1024_anno3.edu.txt.dep
. Could you please clarify which one of the multiple annotated files are considered the gold annotation file?
Secondly, just wanted to double check: are same-unit
, joint
, and comparison
the only discourse relations that are considered multinuclear (i.e. symmetric) in SciDTB?
Thirdly, there are 4 files that contain textual EDUs whose parent head is -1
, as exemplified below by eduid=1
(dev file: D14-1080) in the figure. Could this be an error from the original annotation file? The file names for these are as follows:
train
: P16-1069_anno2dev
: D14-1080; D14-1099test
: D14_1042
Lastly, is there an annotation manual and/or detailed documentation of this dataset that is publicly available for reference regarding various aspects of the data?
Looking forward to hearing from you soon!
Cheers,
Janet