LogicST: A Logical Self-Training Framework for Document-Level Relation Extraction with Incomplete Annotations

This repository contains the code for the paper "LogicST: A Logical Self-Training Framework for Document-Level Relation Extraction with Incomplete Annotations," which is accpeted by EMNLP 2024 main conference as a long paper.

Requirements

To run this code, you will need the following Python packages:

apex==0.1
bibtexparser==1.4.1
dill==0.3.4
matplotlib==2.2.3
numpy==1.19.5
opt_einsum==3.3.0
pandas==1.1.5
pyecharts==2.0.3
scipy==1.5.4
torch==1.7.1+cu101
tqdm==4.62.1
transformers==4.18.0
ujson==4.0.2

Dataset

The datasets used in this project can be downloaded from the following links:

The DocRED dataset can be downloaded following the instructions here.
The Re-DocRED dataset can be downloaded following the instructions here.
The DocRED_ext dataset can be downloaded following the instructions here.
The DocGNRE dataset can be downloaded following the instructions here.
The original DWIE dataset can be downloaded following the instructions here. The pre-processing process is the same as LogiRE, more details can be found here. We also provide a script ./dataset/dwie/build_incomplete_dataset.py to generate the incompletely labeled dataset.

LogicST
 |-- dataset
 |    |-- docred
 |    |    |-- rel_info.json        
 |    |    |-- rel2id.json        
 |    |    |-- train_annotated.json (DocRED)     
 |    |    |-- train_ext.json (DocRED_ext)
 |    |    |-- dev_revised.json (Re-DocRED)
 |    |    |-- test_revised.json (Re-DocRED)
 |    |    |-- re_docred_test_data_enhancement_human.json (DocGNRE)
 |    |-- dwie
 |    |    |-- train_annotated.json 
 |    |    |-- train_incomplete_0.2.json
 |    |    |-- train_incomplete_0.4.json
 |    |    |-- train_incomplete_0.6.json
 |    |    |-- train_incomplete_0.8.json
 |    |    |-- meta
 |    |    |    |--ner2id.json
 |    |    |    |--rel2id.json
 |    |    |    |--word2id.json
 |    |    |    |--vec.npy

Logical Rules

We use the rule miner from MILR. More details can be found in link and the python file ./mine_rule.py.

Training

DocRED

Train the BERT / RoBERTa model on DocRED with the following command:

>> sh scripts/run_bert_docred.sh $i  # for BERT trained on cuda:i
>> sh scripts/run_roberta_docred.sh $i  # for RoBERTa trained on cuda:i

DWIE

>> scripts/run_bert_dwie.sh $i $j  # for BERT trained on positive sampling ratio with $j on cuda:i

Evaluating Models

The save_path is recorded in the training log file.

You can load the model by the --load_path argument, then the code will skip training and evaluate the saved model on benchmarks.

To facilitate reproduction, we provide two checkpoints trained on DocRED and DWIE with 40% positive sampling with BERT-base-uncased.

Predictions

We provide various frameworks' predictions on DocRED's test set in ./results. These include vanilla ATLOP, negative sampling, CAST, P3M, LogicST.

Logs

We provide logs trained on the training set of DocRED and DWIE with 40% sampling ratios in ./logs.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
dataset/dwie		dataset/dwie
logs		logs
mined_rules		mined_rules
results		results
scripts		scripts
LogicST_emnlp2024.pptx		LogicST_emnlp2024.pptx
README.md		README.md
evaluation.py		evaluation.py
long_seq.py		long_seq.py
losses.py		losses.py
mine_rule.py		mine_rule.py
model.py		model.py
prepro.py		prepro.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LogicST: A Logical Self-Training Framework for Document-Level Relation Extraction with Incomplete Annotations

Requirements

Dataset

Logical Rules

Training

DocRED

DWIE

Evaluating Models

Predictions

Logs

About

Uh oh!

Releases

Packages

Languages

XingYing-stack/LogicST

Folders and files

Latest commit

History

Repository files navigation

LogicST: A Logical Self-Training Framework for Document-Level Relation Extraction with Incomplete Annotations

Requirements

Dataset

Logical Rules

Training

DocRED

DWIE

Evaluating Models

Predictions

Logs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages