Skip to content

Preprocessing ERE #16

@zhou6140919

Description

@zhou6140919

I ran the script preprocessing/process_ere.py and I discovered that the amount of sentences in train.w1.oneie.json (12977) is not as same as the paper claimed (14736). And of course, I cannot reproduce the F1 score result on the ERE-EN dataset.

So I looked into this script and in line 1336, it just ignored all the data in dataset 'normal'. However, if I changed to os.path.join(input_dir, 'source', 'cmptxt', '*', '*.txt')). An error occurs when processing this line entity.char_offsets_to_token_offsets(tokens), only a few docs. Ignoring all errors, I got 18895, but still not the same.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions