Skip to content

kaopanboonyuen/2110446_DataScience_2021s2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ„ 2110446 Data Science Course at Chulalongkorn University (2022)

Support Ukraine

Welcome to Data Science

Welcome to the Data Science Course (2110446) at Chulalongkorn University! This repository contains the code, exercises, and resources to guide you through the fascinating world of data science, deep learning, and more. Each week, you'll dive into new concepts, reinforced with hands-on labs, to help you build a strong foundation in data science.

πŸ“š Weekly Labs and Exercises

Week 1: Intro to Numpy, Pandas

  1. Numpy: Introduction to numerical computing with Numpy. Open In Colab

  2. Pandas: Basic data manipulation with Pandas. Open In Colab

  3. Pandas with Youtube stat data: Analyzing YouTube video statistics with Pandas. Open In Colab

  4. (Advanced) Pandas with Youtube stat data: Advanced analysis of YouTube video statistics. Open In Colab

  5. Assignment (Pandas with Youtube stat data): Hands-on assignment analyzing YouTube video statistics. Open In Colab

Week 2: Data Preparation

  1. EDA: Exploratory Data Analysis on loan data. Open In Colab

  2. Impute Missing Value: Handling missing data in loan datasets. Open In Colab

  3. Split Train/Test: Splitting datasets into training and testing sets. Open In Colab

  4. Outliers with Log: Identifying and handling outliers using logarithmic transformation. Open In Colab

  5. Outliers with Log (Titanic DataSet): Advanced outlier analysis using the Titanic dataset. Open In Colab

  6. Assignment: Titanic dataset analysis assignment. Open In Colab

Week 3: Statistical Analysis

  1. Basic Stat: Introduction to basic statistical analysis. Open In Colab

  2. Intermediate Stat: Intermediate statistical techniques and their applications. Open In Colab

  3. Assignment (Stat): Statistical analysis homework assignment. Open In Colab

Stat Data

Week 4: Regression

  1. Assignment: Regression analysis assignment on bank marketing data. Open In GitHub

Week 5: Traditional ML

  1. Decision Trees: Implementing decision trees and random forests. Open In Colab

  2. Linear Regression: Applying linear regression to predict outcomes. Open In Colab

  3. Logistic Regression: Building and evaluating logistic regression models. Open In Colab

  4. Neural Network: Introduction to basic neural networks for classification. Open In Colab

Traditional ML

Week 6: Introduction to Deep Learning

  1. Image Classification (Basic): Classify flowers using basic image classification techniques.
    Open In Colab

  2. Image Classification (Advanced): Explore advanced flower classification with EfficientNet and pretrained weights.
    Open In Colab

  3. Semantic Segmentation (UNET): Segment images of pets using the Oxford-IIIT pet dataset and UNET.
    Open In Colab

  4. LSTM for Stock Price Prediction: Predict stock prices with Long Short-Term Memory (LSTM) networks.
    Open In Colab

  5. SARIMAX for PM2.5 Forecasting: Forecast PM2.5 levels using the SARIMAX model.
    Open In Colab

    Assignment: Fashion MNIST Classification

    Scrape

Week 7: Data Extraction

  1. Basic Web Scraping: Learn the basics of web scraping to extract data from web pages.
    Open In Colab

  2. Wikipedia Data Extraction: Extract structured data from Wikipedia pages.
    Open In Colab

  3. REST API Data Extraction: Interact with APIs to gather data programmatically.
    Open In Colab

  4. Twitter Data Extraction: Collect data from Twitter using its API.
    Open In Colab

  5. Web Automation with Selenium: Automate web browsing and data collection using Selenium.
    Open In Colab

    Assignment: Web Scraping Assignment

Week 8: Data Ingestion

All relevant code and scripts are available here.

  1. Kafka Sample Producer: Learn how to produce data streams with Kafka.
    Open In GitHub

  2. Kafka Sample Consumer: Consume data streams from Kafka topics.
    Open In GitHub

  3. Kafka with AVRO: Handle data streams with AVRO serialization.

  4. Sensor Data Ingestion: Work with sensor data streams.

Download all source code here (week8_dataingestion.zip).

Week9: Spark

  1. Basic Spark: Open In Colab
  2. Spark SQL: Open In Colab
  3. Spark ML: Open In Colab

Data Set:

  1. Bank: Open In GitHub
  2. Star Wars: Open In GitHub

Week10: Spark Streaming

  1. Basic Spark Streaming: Open In Colab
  2. Spark Streaming Window Operations: Open In Colab
  3. Basic Structured Streaming: Open In Colab
  4. Structured Streaming Window Operations: Open In Colab
  5. Structured Streaming and Kafka: Open In Colab

Data Set:

Star Wars: Open In GitHub

Week11: Airflow

All code is here: this link

Anyway, you can download all source codes for week11_airflow_and_fastapi through this link (week11_airflow_and_fastapi.zip).

** Updated python codes/notebooks will be posted here shortly before each lecture.

πŸ›  Environment Setup

The code in this repository is designed to run in Google Colab or a local Python environment. To get started locally, ensure you have Python 3.8+ installed and use the following steps to set up your environment:

git clone https://github.com/kaopanboonyuen/2110446_DataScience_2021s2.git
cd 2110446_DataScience_2021s2
pip install -r requirements.txt

πŸ“š References

  1. https://www.kaggle.com/code
  2. https://www.tensorflow.org/tutorials
  3. https://github.com/topics/machine-learning
  4. https://archive.ics.uci.edu/ml/datasets.php
  5. https://colab.research.google.com/notebooks/

πŸŽ“ License

This project is licensed under the MIT License. See the LICENSE file for more information.

πŸ›‘οΈ Disclaimer

This repository is for educational purposes only. All code and resources are provided as-is, without any guarantees or warranties.

For any questions or feedback, please contact me at Kao Panboonyuen.

About

Data Science Course at Dept. of Computer Engineering, Chula 2022

Resources

License

Stars

Watchers

Forks

Packages

No packages published