Skip to content

NatLibFi/Annif-LLMs4Subjects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Annif for LLMs4Subjects

Here we use an extreme multilabel text classification (XMTC) approaches implemented in Annif - together with LLM-based approaches for data preparation - for the shared task 5 of SemEval'25. Attention is also paid to hyperparameter optimization. DVC is used to manage the data sets and to coordinate the training and evaluation processes.

Please see our system description preprint:

Suominen, O., Inkinen, J., & Lehtinen, M. (2025). Annif at SemEval-2025 Task 5: Traditional XMTC augmented by LLMs. https://arxiv.org/abs/2504.19675 (Pre-print)

See also the task description preprint:

D'Souza, J., Sadruddin, S., Israel, H., Begoin, M. & Slawig, D. (2025). SemEval-2025 Task 5: LLMs4Subjects -- LLM-based Automated Subject Tagging for a National Technical Library's Open-Access Catalog https://arxiv.org/abs/2504.07199 (Pre-print)

The Annif models trained for this task are available here:

https://huggingface.co/NatLibFi/Annif-LLMs4Subjects-data

Contributors 3

  •  
  •  
  •