June 2025
This repository contains hands-on exercises and learning materials from DigiTalent’s Fundamental Data Science training. The focus topics include:
-
🌐 Data Scraping Learn how to acquire data from various web sources using automated tools. Subtopics:
- What is Data?
- Data Collection Methods
- Data Scraping Tools
- Data Integrity & Ethics
- Hands-on Practice through the included self-practice exercises
-
📈 Data Exploration Analyze and understand the structure and patterns in your data. Subtopics:
- Data Understanding
- Reviewing Dataset Structure
- Data Validation Techniques
- Hands-on Practice through the included self-practice exercises
-
🧹 Data Cleansing Clean and refine your dataset to ensure quality and reliability. Subtopics:
- Data Cleaning Concepts
- Handling Missing & Duplicate Values
- Data Reduction Strategies
- Hands-on Practice through the included self-practice exercises
-
🏷️ Data Annotation Prepare labeled datasets for use in supervised machine learning tasks. Subtopics:
- Defining Labels & Categories
- Data Annotation Techniques
- Manual & Assisted Labeling Tools
- Hands-on Practice through the included self-practice exercises
DigiTalentPractice-FundamentalDataScience/
├── data/ # Contains raw/external datasets
│ ├── Data_Nasabah.csv # Local dataset
│ └── train_prices.csv # Kaggle dataset (not included in repo)
│
├── notebooks/ # Jupyter notebooks
│ ├── self_practice-1.ipynb
│ ├── self_practice-2.ipynb
│ ├── self_practice-3.ipynb
│ └── self_practice-4.ipynb
│
├── requirements.txt # Python dependencies
├── README.md # Project overview and setup instructions
└── .gitignore # Files/folders to exclude from version control
-
📥 Clone this repository to your local machine:
git clone https://github.com/RyanGA09/DigiTalentPractice-FundamentalDataScience.git
-
📦 Install the environment (recommended to use venv or conda):
pip install -r requirements.txt
-
📘 Open the notebook corresponding to the topic you want to learn and run the code cells sequentially.
Ryan Gading Abdullah