Skip to content

A curated, community-maintained list of free and open datasets for Artificial Intelligence and Machine Learning projects focused on Africa.

Notifications You must be signed in to change notification settings

AI4Africa/african-ai-datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

20 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🌍 Awesome African AI Datasets

A curated, community-maintained list of free and open datasets for Artificial Intelligence and Machine Learning projects focused on Africa.

Datasets are tagged by domain, have Kaggle-style metadata, and are verified to be truly free for research or commercial use (license permitting).

βœ… Always check individual dataset licenses before use.


πŸ“‚ Table of Contents


πŸ—£ Natural Language Processing (NLP)

MasakhaNER

Domain License Year

  • Description: Named Entity Recognition datasets for multiple African languages.
  • Languages: Yoruba, Hausa, Igbo, Swahili, Amharic, Wolof, Kinyarwanda, and more.
  • Size: ~20K annotated sentences.
  • Samples: Named entities tagged in context.
  • Tasks: NER model training, evaluation, transfer learning.
  • Source: Masakhane Project.
  • Link: https://github.com/masakhane-io/masakhaner
  • License: Mixed permissive licenses.
  • Last Updated: 2021-05
  • Best For: Low-resource NLP research, multilingual NER.

MasakhaPOS

Domain License Year

  • Description: POS-tagged datasets for African languages.
  • Languages: Yoruba, Hausa, Igbo, Swahili, Wolof, etc.
  • Size: ~10K sentences.
  • Samples: Tokenized and tagged sentences.
  • Tasks: POS tagging model development.
  • Source: Masakhane Project.
  • Link: https://github.com/masakhane-io/masakhapos
  • License: CC BY-SA 4.0.
  • Last Updated: 2020-11
  • Best For: Linguistic modeling & POS benchmarking.

African Storybooks Corpus

Domain License Year

  • Description: Children's storybooks in multiple African languages.
  • Languages: Zulu, Xhosa, Swahili, Amharic, Hausa, etc.
  • Size: 3,000+ books.
  • Samples: Parallel text in multiple languages.
  • Tasks: Machine translation, text generation.
  • Source: African Storybook Project.
  • Link: https://www.africanstorybook.org
  • License: CC BY 4.0.
  • Last Updated: 2023-04
  • Best For: Multilingual MT, literacy applications.

πŸŽ™ Speech / Voice

Mozilla Common Voice β€” African Languages

Domain License Year

ALFFA Public Yoruba, Hausa & Wolof Speech Corpora

Domain License Year

OpenSLR African Corpora

Domain License Year


πŸ“Έ Computer Vision & Wildlife

Snapshot Serengeti (LILA)

Domain License

African Wildlife Dataset (Kaggle)

Domain License


πŸ›° Geospatial & Agriculture

AfriCultuReS Crop Type Dataset

Domain License

Africapolis Urban Data

Domain License


🌦 Climate & Weather

CHIRPS

Domain License

TAHMO Weather Stations

Domain License

FEWS NET Africa Rainfall Estimates

Domain License


πŸ₯ Health & Demographics

DHS Program β€” African Countries

Domain License

WHO African Health Observatory Data

Domain License

Global Health Observatory β€” Africa

Domain License


🀝 Contribution Guide

We welcome pull requests to add, update, or improve dataset entries.

Steps:

  1. Fork this repository.
  2. Add your dataset entry in the correct section using the template below.

Dataset Entry Template

### Dataset Name
![Domain](https://img.shields.io/badge/Domain-DOMAINCOLOR) ![License](https://img.shields.io/badge/License-LICENSETYPE-green) ![Year](https://img.shields.io/badge/Year-YYYY-orange)
- **Description**: Short description of the dataset.
- **Languages / Geography**: List languages or regions covered.
- **Size**: Approximate size in MB/GB or number of samples.
- **Samples**: Brief description of sample type.
- **Tasks**: List AI/ML tasks supported.
- **Source**: Organization or project name.
- **Link**: [Dataset link](https://example.com)
- **License**: License type.
- **Last Updated**: YYYY-MM.
- **Best For**: Suggested research/application areas.

About

A curated, community-maintained list of free and open datasets for Artificial Intelligence and Machine Learning projects focused on Africa.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages