This repository provides information about the capacity building initiative, a collaboration between the African Institute for Mathematical Sciences (AIMS), the National Institute of Statistics of Rwanda (NISR), and Cenfri. The initiative aims to deliver data science training to staff from over 20 institutions that are part of Rwanda's National Statistical System (NSS).
This repository provides information about the capacity building initiative, a collaboration between the African Institute for Mathematical Sciences (AIMS), the National Institute of Statistics of Rwanda (NISR), and Cenfri. The initiative aims to deliver data science training to staff from over 20 institutions that are part of Rwanda’s National Statistical System (NSS).
The training program is structured around the following core modules:
-
Python Foundations for Data Science
Covers core and advanced Python programming, object-oriented design, development tools, Git/GitHub workflows, Python packaging, and best practices for maintainable code. -
Data Analysis with Python
Focuses on cleaning, transforming, and analyzing structured data using pandas and polars, with applications to real-world datasets like census and surveys. -
Working with Spatial Data in Python
Introduces geospatial data handling using GeoPandas, shapely, and rasterio; includes spatial joins, projections, and mapping access to services. -
Working with Time Series Data in Python
Covers handling temporal data, resampling, rolling windows, time series decomposition, and forecasting using statsmodels and Prophet. -
Databases and APIs
Introduces SQL and relational databases, extracting and analyzing data from APIs, and building REST APIs with FastAPI. -
Introduction to Machine Learning with Scikit-learn
Provides foundations in supervised and unsupervised learning, including models like logistic regression, decision trees, and PCA, along with model evaluation. -
Natural Language Processing (NLP) and Large Language Models (LLMs)
Covers foundational NLP, LLMs, embeddings, vector search, and building AI-powered applications using frameworks like Hugging Face and LangChain. -
Advanced Topics in Data Science
Explores interactive dashboards, data integration from unstructured sources, advanced database techniques, and cloud storage for analytics. -
Capstone Project
Teams develop and present real-world data projects scoped from their institutional needs, applying techniques learned across modules.
The course is divided into self-contained modules, each designed to provide useful skills and knowledge. The modules are organized sequentially to build on skills learned in previous modules. To make the course engaging and informative, each module includes the following components:
-
Lecture
Each lecture covers key conceptual knowledge for the topic at hand. -
Practical Labs
Programming activities provide learners with practical skills to implement solutions discussed in lectures. These labs include adaptable recipes for various use cases. -
Case Studies
Case studies showcase elaborate projects that demonstrate real-world applications. -
Assessment
Each module assessment combines theoretical (quizzes) and programming questions to evaluate learners' understanding of the concepts and skills covered in the module.
This repository serves as the primary resource for accessing course content, including slides, Python programming labs, example applications using LLMs, and additional materials to support learning about Generative AI and building applications with LLMs. For easy navigation, use the link and contents outlined below.
Please visit the documentation website for a complete table of contents.
The template is licensed under the Mozilla Public License. Remember to replace the license if necessary. If open source, choose an open source license.