Skip to content

fonshartendorp/dutch_biomedical_entity_linking

Repository files navigation

Biomedical Entity Linking for Dutch: Fine-tuning a Self-alignment BERT Model on an Automatically Generated Wikipedia Corpus

This repository contains the code for generating the training data and training and evaluating the sapBERT+fine-tuned Dutch biomedical entity linking model as presented in the paper.

Summary

  • a RoBERTa-based basemodel that is trained from scratch on Dutch hospital notes (medRoBERTa.nl).
  • that is 2nd-phase pretrained using self-alignment on a UMLS-derived Dutch biomedical ontology.
  • and finally fine-tuned on automatically generated weakly labelled corpus from Wikipedia (WALVIS).
  • evaluation results on Mantra GSC corpus can be found in the report.

Overview

The code for enhancing the UMLS and creating a biomedical ontology for biomedical entity linking (1_enhance_UMLS) is forked from the Dutch-medical-concepts repository from the UMCU. The code for self-alignment pretraining and fine-tuning is largely re-used from the code base of the original sapBERT paper.

For enhancing the UMLS a UMLS and SNOMED NL license should be requested.

ONTOLOGY-browser

The ONTOLOGY-browser is a minimal Flask-based browser tool for comparing UMLS entries.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages