Skip to content

dmatekenya/AIMS-DSCBI

Data Science Capacity Building Initiative (DSCBI)

This repository provides information about the capacity building initiative, a collaboration between the African Institute for Mathematical Sciences (AIMS), the National Institute of Statistics of Rwanda (NISR), and Cenfri. The initiative aims to deliver data science training to staff from over 20 institutions that are part of Rwanda's National Statistical System (NSS).

This repository provides information about the capacity building initiative, a collaboration between the African Institute for Mathematical Sciences (AIMS), the National Institute of Statistics of Rwanda (NISR), and Cenfri. The initiative aims to deliver data science training to staff from over 20 institutions that are part of Rwanda’s National Statistical System (NSS).

Course Modules

The training program is structured around the following core modules:

  1. Python Foundations for Data Science
    Covers core and advanced Python programming, object-oriented design, development tools, Git/GitHub workflows, Python packaging, and best practices for maintainable code.

  2. Data Analysis with Python
    Focuses on cleaning, transforming, and analyzing structured data using pandas and polars, with applications to real-world datasets like census and surveys.

  3. Working with Spatial Data in Python
    Introduces geospatial data handling using GeoPandas, shapely, and rasterio; includes spatial joins, projections, and mapping access to services.

  4. Working with Time Series Data in Python
    Covers handling temporal data, resampling, rolling windows, time series decomposition, and forecasting using statsmodels and Prophet.

  5. Databases and APIs
    Introduces SQL and relational databases, extracting and analyzing data from APIs, and building REST APIs with FastAPI.

  6. Introduction to Machine Learning with Scikit-learn
    Provides foundations in supervised and unsupervised learning, including models like logistic regression, decision trees, and PCA, along with model evaluation.

  7. Natural Language Processing (NLP) and Large Language Models (LLMs)
    Covers foundational NLP, LLMs, embeddings, vector search, and building AI-powered applications using frameworks like Hugging Face and LangChain.

  8. Advanced Topics in Data Science
    Explores interactive dashboards, data integration from unstructured sources, advanced database techniques, and cloud storage for analytics.

  9. Capstone Project
    Teams develop and present real-world data projects scoped from their institutional needs, applying techniques learned across modules.

Course Structure

The course is divided into self-contained modules, each designed to provide useful skills and knowledge. The modules are organized sequentially to build on skills learned in previous modules. To make the course engaging and informative, each module includes the following components:

  • Lecture
    Each lecture covers key conceptual knowledge for the topic at hand.

  • Practical Labs
    Programming activities provide learners with practical skills to implement solutions discussed in lectures. These labs include adaptable recipes for various use cases.

  • Case Studies
    Case studies showcase elaborate projects that demonstrate real-world applications.

  • Assessment
    Each module assessment combines theoretical (quizzes) and programming questions to evaluate learners' understanding of the concepts and skills covered in the module.

Repository Structure and Contents

This repository serves as the primary resource for accessing course content, including slides, Python programming labs, example applications using LLMs, and additional materials to support learning about Generative AI and building applications with LLMs. For easy navigation, use the link and contents outlined below.

Contents

Please visit the documentation website for a complete table of contents.

License

The template is licensed under the Mozilla Public License. Remember to replace the license if necessary. If open source, choose an open source license.

About

Data science course for Rwanda national statistical system (NSS) staff

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •