Skip to content

MaLA-LM/LangResourceAtlas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LangResourceAtlas: A Comprehensive Map of Language Resource Categorization

About

LangResourceAtlas is a curated repository providing a comprehensive categorization of over 500 languages into "high," "medium," and "low" resource groups based on their digital linguistic data availability. This initiative aims to serve as a foundational resource for researchers, developers, and practitioners working on multilingual Natural Language Processing (NLP) tasks, especially those focusing on less-resourced languages. We aim at:

  • Standardizing Resource Levels: Offering a shared reference for understanding language resource availability across a wide spectrum of languages unified into ISO 639-3 standard and writing systems in ISO 15924 standard.
  • Consolidating Information: Bringing together insights from various prominent datasets and research efforts into a single, accessible location.

Data Sources

The categorization in LangResourceAtlas is informed by a careful analysis and synthesis of information derived from, but not limited to, the following critical data sources and research initiatives:

Contributing

We welcome contributions to improve the accuracy and coverage of LangResourceAtlas! If you have:

  • New data sources that can inform resource categorization.
  • Corrections to existing categorizations.
  • Suggestions for improving the methodology or data format.

Please open an issue or submit a pull request.

Contact

For any questions or inquiries, please open an issue on this GitHub repository or join our Discord server MaLA-LM.

Citation

If you use LangResourceAtlas in your research or work, please cite it using the following BibTeX entry:

@misc{LangResourceAtlas,
  author = {Li, Zihao and Ji, Shaoxiong},
  title = {{LangResourceAtlas: A Comprehensive Map of Language Resource Categorization}},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/MaLA-LM/LangResourceAtlas}}
}

About

LangResourceAtlas: A Comprehensive Map of Language Resource Categorization

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •