Skip to content

A repository of self practice hands-on projects from DigiTalent’s Fundamental Data Science training, covering web scraping, data exploration, data cleaning, and data annotation. Includes Jupyter notebooks and example code for practical learning.

Notifications You must be signed in to change notification settings

RyanGA09/DigiTalent_FundamentalDataScience-SelfPractice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📊 DigiTalent Fundamental Data Science - Self Practice

📅 Created On

June 2025

📜 Description

This repository contains hands-on exercises and learning materials from DigiTalent’s Fundamental Data Science training. The focus topics include:

  • 🌐 Data Scraping Learn how to acquire data from various web sources using automated tools. Subtopics:

    • What is Data?
    • Data Collection Methods
    • Data Scraping Tools
    • Data Integrity & Ethics
    • Hands-on Practice through the included self-practice exercises
  • 📈 Data Exploration Analyze and understand the structure and patterns in your data. Subtopics:

    • Data Understanding
    • Reviewing Dataset Structure
    • Data Validation Techniques
    • Hands-on Practice through the included self-practice exercises
  • 🧹 Data Cleansing Clean and refine your dataset to ensure quality and reliability. Subtopics:

    • Data Cleaning Concepts
    • Handling Missing & Duplicate Values
    • Data Reduction Strategies
    • Hands-on Practice through the included self-practice exercises
  • 🏷️ Data Annotation Prepare labeled datasets for use in supervised machine learning tasks. Subtopics:

    • Defining Labels & Categories
    • Data Annotation Techniques
    • Manual & Assisted Labeling Tools
    • Hands-on Practice through the included self-practice exercises

🗂️ Repository Structure

DigiTalentPractice-FundamentalDataScience/
├── data/                          # Contains raw/external datasets
│   ├── Data_Nasabah.csv           # Local dataset
│   └── train_prices.csv           # Kaggle dataset (not included in repo)
│
├── notebooks/                     # Jupyter notebooks
│   ├── self_practice-1.ipynb
│   ├── self_practice-2.ipynb
│   ├── self_practice-3.ipynb
│   └── self_practice-4.ipynb
│
├── requirements.txt               # Python dependencies
├── README.md                      # Project overview and setup instructions
└── .gitignore                     # Files/folders to exclude from version control

⚠️ Note: data/train_prices.csv is downloaded via the Kaggle API and is not included in this repository. Make sure to download it manually before running related notebooks.

🚀 How to Use

  1. 📥 Clone this repository to your local machine:

    git clone https://github.com/RyanGA09/DigiTalentPractice-FundamentalDataScience.git
  2. 📦 Install the environment (recommended to use venv or conda):

    pip install -r requirements.txt
  3. 📘 Open the notebook corresponding to the topic you want to learn and run the code cells sequentially.

👨‍💻 Author

Ryan Gading Abdullah

GitHub GitLab Instagram LinkedIn

About

A repository of self practice hands-on projects from DigiTalent’s Fundamental Data Science training, covering web scraping, data exploration, data cleaning, and data annotation. Includes Jupyter notebooks and example code for practical learning.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published