Skip to content

Fine-tuned MarianMT model with UN Parallel Corpora dataset to create accurate and contexual translations for UN documents.

Notifications You must be signed in to change notification settings

almagashi/UN_Document_Translator

Repository files navigation

UN Document Translator

A specialized translation model powered by MarianMT and fine-tuned with the UN Parallel Corpora dataset, built to deliver precise, contextually accurate translations for United Nations documents. This model supports multilingual translations across the UN's six official languages, helping human translators and researchers handle large volumes of content with ease and accuracy.


🌍 Try It Out

💻 Full Application


🔧 Features

  • Context-Aware Translations: Optimized for formal, technical, and nuanced language found in UN documents.
  • Adaptable for UN-Specific Terminology: Handles terminology with precision, using a glossary from UN resources.
  • Integrated with Hugging Face: Easily accessible for developers, linguists, and international organizations via the Hugging Face platform.

🚀 Technology Stack

  • MarianMT: A transformer-based, multilingual translation model.
  • Hugging Face: Hosting for the model to facilitate easy access and real-time translation demos.
  • FastAPI: Lightweight, fast backend framework for building and serving API endpoints.

📈 Performance

  • >93% cosine similarity score between human (native speaker) and model translations in unseen data.
  • Outperforms human translations in blind studies.

📄 Dataset

The model has been fine-tuned using the UN Parallel Corpora, a large collection of multilingual UN documents that provide high-quality, parallel translations across multiple languages.

💡 About the Project

This project is part of Translate4Good, a Hack for Impact hackathon winner 🏆, aiming to simplify translation workflows for UN agencies, NGOs, and researchers by providing adaptive, high-quality translations, where I contributed by building this model end-to-end.


🤝 Contributing

If you're interested in contributing to UN Document Translator, please check out the repository and submit a pull request.

Citations

About

Fine-tuned MarianMT model with UN Parallel Corpora dataset to create accurate and contexual translations for UN documents.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published