A specialized translation model powered by MarianMT and fine-tuned with the UN Parallel Corpora dataset, built to deliver precise, contextually accurate translations for United Nations documents. This model supports multilingual translations across the UN's six official languages, helping human translators and researchers handle large volumes of content with ease and accuracy.
- Model: Translate4Good on Hugging Face
- Demo: Translate4Good on Devpost
- GitHub Repository for the full application: UN Document Translator App
- See it in motion: YouTube Demo
- Context-Aware Translations: Optimized for formal, technical, and nuanced language found in UN documents.
- Adaptable for UN-Specific Terminology: Handles terminology with precision, using a glossary from UN resources.
- Integrated with Hugging Face: Easily accessible for developers, linguists, and international organizations via the Hugging Face platform.
- MarianMT: A transformer-based, multilingual translation model.
- Hugging Face: Hosting for the model to facilitate easy access and real-time translation demos.
- FastAPI: Lightweight, fast backend framework for building and serving API endpoints.
- >93% cosine similarity score between human (native speaker) and model translations in unseen data.
- Outperforms human translations in blind studies.
The model has been fine-tuned using the UN Parallel Corpora, a large collection of multilingual UN documents that provide high-quality, parallel translations across multiple languages.
This project is part of Translate4Good, a Hack for Impact hackathon winner 🏆, aiming to simplify translation workflows for UN agencies, NGOs, and researchers by providing adaptive, high-quality translations, where I contributed by building this model end-to-end.
If you're interested in contributing to UN Document Translator, please check out the repository and submit a pull request.
- https://huggingface.co/Helsinki-NLP/opus-mt-en-es
- Ziemski, M., Junczys-Dowmunt, M., and Pouliquen, B., (2016), The United Nations Parallel Corpus, Language Resources and Evaluation (LREC’16), Portorož, Slovenia, May 2016.