GutBrainIE @ BioASQ2025

0. Place raw datasets from organiser in `data/raw`

NER

1. Set Configs

Edit your configuration files in conf/conf_gutbrain

2. Preprocess Data

Run the following scripts to prepare the dataset:


# For train/dev set
python src/ner/utils/get_bio2.py  
python src/ner/utils/preprocess.py

# For test set
python src/ner/utils/create_dummy_annot_test.py
python src/ner/utils/get_bio2.py  
python src/ner/utils/preprocess.py

Processed data is saved to: data/preprocessed

3. Run train/inference

# Train
python src/ner/train/main.py +exp=train  

# Run prediction on dev set
python src/ner/train/main.py +exp=predict

# Run inference on test set
python src/ner/train/test.py +exp=ner/test model.model_name_or_path={trained_model_path}

Trained models are saved in output/ner/train_res

4. Post-process for eval

python src/utils/postprocess.py

# For submission format, see:
eval/NER_get_devset_submission_file_and_evaluate.ipynb
eval/NER_get_testset_submission_file.ipynb

RE

1. Set Configs

Edit your configuration files in conf/conf_gutbrain

2. Preprocess Data

Run the following scripts to prepare the dataset:

# For train/dev set
python src/re/utils/preprocess_w_negatives.py

# For test set
python src/re/utils/preprocess_w_negatives_testset.py

Processed data is saved to: data/preprocessed

3. Run train/inference

# Train
python src/re/train/main.py +exp=train  

# Prediction on dev/test set
python src/re/train/test.py +exp=predict
python src/re/train/test.py +exp=test

Trained models are saved in output/re/train_res
Prediction results are saved in submission format in output/re/test_res

RE Baselines

1. Simple baseline based on train corpus stats

The train corpus stats are generated in ./00_Data_Overview.ipynb.

We extract relation and co-occurrence statistics to support relation prediction:

frequency: Number of times a (subject_label, object_label) pair appears in labeled relations.
cooccurrence_frequency: Number of times the same pair co-occurs in entity annotations (regardless of relation).
relation_likelihood: Computed as frequency / cooccurrence_frequency, estimating the probability of a relation when the pair co-occurs.
avg_char_distance, median_distance, min/max_distance_percentile: Character-based distance metrics between subject and object spans.
predicate_counts: Counts of different predicates assigned to the pair across annotations.
annotators: List of annotators who labeled the relation, used to filter weak or distant-only annotations.

Those are saved in ./data/ds_stats/train_binary_rel_stats.csv.

The prediction code based on this data is made in ./src/baseline_RE.py.

2. REBEL

Experiments with https://github.com/Babelscape/rebel.

1.1. Model fine-tuning

Created new files:

Adapted the following files to include new relation and entity types:

Run on server: ./rebel_RE/src/train_job.sh.

1.2. Model predictions and eval

Run on server: ./rebel_RE/src/test_job.sh.

This will save an output like ./rebel_RE/predictions/preds_gutbrainie.jsonl.

This output can be then converted into the challenge format for evaluation via ./rebel_RE/predictions/convert_to_format.py.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
conf/conf_gutbrain		conf/conf_gutbrain
data		data
eval		eval
predictions/RE		predictions/RE
rebel_RE		rebel_RE
scripts		scripts
src		src
.gitignore		.gitignore
00_Data_Overview.ipynb		00_Data_Overview.ipynb
eda.ipynb		eda.ipynb
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GutBrainIE @ BioASQ2025

0. Place raw datasets from organiser in `data/raw`

NER

1. Set Configs

2. Preprocess Data

3. Run train/inference

4. Post-process for eval

RE

1. Set Configs

2. Preprocess Data

3. Run train/inference

RE Baselines

1. Simple baseline based on train corpus stats

2. REBEL

1.1. Model fine-tuning

1.2. Model predictions and eval

About

Uh oh!

Releases

Packages

Languages

chaeeunlee-io/bioasq2025

Folders and files

Latest commit

History

Repository files navigation

GutBrainIE @ BioASQ2025

0. Place raw datasets from organiser in data/raw

NER

1. Set Configs

2. Preprocess Data

3. Run train/inference

4. Post-process for eval

RE

1. Set Configs

2. Preprocess Data

3. Run train/inference

RE Baselines

1. Simple baseline based on train corpus stats

2. REBEL

1.1. Model fine-tuning

1.2. Model predictions and eval

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

0. Place raw datasets from organiser in `data/raw`

Packages