SOMD 2025: Finetuning ModernBERT for In- and Out-of-Distribution NER and Relation Extraction of Software Mentions in Scientific Texts
- Overview
- Project Structure
- Installation
- Steps to Run
- Results for NER and RE in Each Phase
- Findings
- Limitations
In this project, we utilize the dataset and evaluation criteria defined by Software Mention Detection - (SOMD 2025) competition to solve the problem of Named Entity Recognition and Relation classification in input sentences from the scientific texts. During the competition, by finetuning ModernBERT and building a joint model on top of it, we achieve best SOMD F1 score of
.
├── EntityModel_checkpoint/
├── FewShot_checkpoint/
├── JointModel_checkpoint/
├── ModernBERT_checkpoint/
├── data/
│ ├── phase_1/
│ │ ├── test_texts.txt
│ │ ├── train_entities.txt
│ │ ├── train_relations.txt
│ │ └── train_texts.txt
│ ├── phase_2/
│ │ ├── entities.txt
│ │ ├── relations.txt
│ │ ├── test_texts.txt
│ │ └── texts.txt
│ └── predictions/
│ ├── phase_1/
│ └── phase_2/
├── results/
│ ├── phase_1.zip
│ ├── phase_2_0.55.zip
│ └── phase_2_0.6.zip
├── src/
│ ├── __init__.py
│ ├── phase_1/
│ │ ├── __init__.py
│ │ ├── config_.py
│ │ ├── infer.py
│ │ └── train.py
│ └── phase_2/
│ ├── __init__.py
│ ├── adapter_weighted_inference.py
│ ├── config.py
│ ├── dataloader.py
│ ├── model.py
│ ├── relation_adapter_weighted.py
│ ├── relation_dataset.py
│ ├── relation_model.py
│ └── utils.py
├── requirements.txt
├── LICENSE
└── README.md
Note: The model files are available at the link : SOMD-2025-models. After cloning the repo, download all the model files in their respective directory for further processing.
git clone https://github.com/ekbanasolutions/somd-2025
cd somd-2025
python -m venv venv
source venv/bin/activate
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
- You can modify the parameters for Phase I in the config file.
cd SOMD_2025/src/phase_1/
python3 train.py
cd SOMD_2025/src/phase_1/
python3 infer.py
- You can modify the parameters for Phase II in the config file.
cd SOMD_2025/src/phase_2/
python3 relation_adapter_weighted.py
cd SOMD_2025/src/phase_2/
python3 adapter_weighted_inference.py
Phase | F1 SOMD | NER F1 | NER Precision | NER Recall | RE F1 | RE Precision | RE Recall |
---|---|---|---|---|---|---|---|
Phase I | 0.89 | 0.93 | 0.93 | 0.95 | 0.84 | 0.85 | 0.86 |
Phase I (Modified Joint Model) | 0.92 | 0.95 | 0.95 | 0.96 | 0.89 | 0.95 | 0.85 |
Phase II | 0.55 | 0.64 | 0.67 | 0.65 | 0.46 | 0.69 | 0.39 |
Open Submission | 0.60 | 0.69 | 0.74 | 0.69 | 0.51 | 0.71 | 0.42 |
-
During Phase I, the Joint Model using ModernBERT achieved the highest overall performance with an F1 score of 0.89.
-
After Phase I, a refined approach — the Modified Joint Model — was developed, which improved the F1 SOMD score to 0.92.
-
The model failed to generalize well to Out-of-Distribution (OOD) dataset in Phase II, resulting in a significant drop in performance with a SOMD F1 score of 0.55.
-
Following multiple post-Phase II experiments, the best SOMD F1 score achieved was 0.60.
-
Poor generalization to Out-of-Distribution (OOD) dataset.
-
Relation Extraction depends heavily on accurate Entity Extraction.