Here we use an extreme multilabel text classification (XMTC) approaches implemented in Annif - together with LLM-based approaches for data preparation - for the shared task 5 of SemEval'25. Attention is also paid to hyperparameter optimization. DVC is used to manage the data sets and to coordinate the training and evaluation processes.
Please see our system description preprint:
Suominen, O., Inkinen, J., & Lehtinen, M. (2025). Annif at SemEval-2025 Task 5: Traditional XMTC augmented by LLMs. https://arxiv.org/abs/2504.19675 (Pre-print)
See also the task description preprint:
D'Souza, J., Sadruddin, S., Israel, H., Begoin, M. & Slawig, D. (2025). SemEval-2025 Task 5: LLMs4Subjects -- LLM-based Automated Subject Tagging for a National Technical Library's Open-Access Catalog https://arxiv.org/abs/2504.07199 (Pre-print)
The Annif models trained for this task are available here: