🍄 2110446 Data Science Course at Chulalongkorn University (2022)

Welcome to the Data Science Course (2110446) at Chulalongkorn University! This repository contains the code, exercises, and resources to guide you through the fascinating world of data science, deep learning, and more. Each week, you'll dive into new concepts, reinforced with hands-on labs, to help you build a strong foundation in data science.

📚 Weekly Labs and Exercises

Week 1: Intro to Numpy, Pandas

Numpy: Introduction to numerical computing with Numpy.
Pandas: Basic data manipulation with Pandas.
Pandas with Youtube stat data: Analyzing YouTube video statistics with Pandas.
(Advanced) Pandas with Youtube stat data: Advanced analysis of YouTube video statistics.
Assignment (Pandas with Youtube stat data): Hands-on assignment analyzing YouTube video statistics.

Week 2: Data Preparation

EDA: Exploratory Data Analysis on loan data.
Impute Missing Value: Handling missing data in loan datasets.
Split Train/Test: Splitting datasets into training and testing sets.
Outliers with Log: Identifying and handling outliers using logarithmic transformation.
Outliers with Log (Titanic DataSet): Advanced outlier analysis using the Titanic dataset.
Assignment: Titanic dataset analysis assignment.

Week 3: Statistical Analysis

Basic Stat: Introduction to basic statistical analysis.
Intermediate Stat: Intermediate statistical techniques and their applications.
Assignment (Stat): Statistical analysis homework assignment.

Week 4: Regression

Assignment: Regression analysis assignment on bank marketing data.

Week 5: Traditional ML

Decision Trees: Implementing decision trees and random forests.
Linear Regression: Applying linear regression to predict outcomes.
Logistic Regression: Building and evaluating logistic regression models.
Neural Network: Introduction to basic neural networks for classification.

Week 6: Introduction to Deep Learning

Image Classification (Basic): Classify flowers using basic image classification techniques.
Image Classification (Advanced): Explore advanced flower classification with EfficientNet and pretrained weights.
Semantic Segmentation (UNET): Segment images of pets using the Oxford-IIIT pet dataset and UNET.
LSTM for Stock Price Prediction: Predict stock prices with Long Short-Term Memory (LSTM) networks.
SARIMAX for PM2.5 Forecasting: Forecast PM2.5 levels using the SARIMAX model.

Assignment: Fashion MNIST Classification

Week 7: Data Extraction

Basic Web Scraping: Learn the basics of web scraping to extract data from web pages.
Wikipedia Data Extraction: Extract structured data from Wikipedia pages.
REST API Data Extraction: Interact with APIs to gather data programmatically.
Twitter Data Extraction: Collect data from Twitter using its API.
Web Automation with Selenium: Automate web browsing and data collection using Selenium.

Assignment: Web Scraping Assignment

Week 8: Data Ingestion

All relevant code and scripts are available here.

Kafka Sample Producer: Learn how to produce data streams with Kafka.
Kafka Sample Consumer: Consume data streams from Kafka topics.
Kafka with AVRO: Handle data streams with AVRO serialization.
- Producer: Open In GitHub
- Consumer: Open In GitHub
Sensor Data Ingestion: Work with sensor data streams.
- FileWriter Consumer: Open In GitHub
- Counter Consumer: Open In GitHub

Download all source code here (week8_dataingestion.zip).

Week9: Spark

Basic Spark:
Spark SQL:
Spark ML:

Data Set:

Bank:
Star Wars:

Week10: Spark Streaming

Basic Spark Streaming:
Spark Streaming Window Operations:
Basic Structured Streaming:
Structured Streaming Window Operations:
Structured Streaming and Kafka:

Data Set:

Star Wars:

Week11: Airflow

All code is here: this link

Anyway, you can download all source codes for week11_airflow_and_fastapi through this link (week11_airflow_and_fastapi.zip).

** Updated python codes/notebooks will be posted here shortly before each lecture.

🛠 Environment Setup

The code in this repository is designed to run in Google Colab or a local Python environment. To get started locally, ensure you have Python 3.8+ installed and use the following steps to set up your environment:

git clone https://github.com/kaopanboonyuen/2110446_DataScience_2021s2.git
cd 2110446_DataScience_2021s2
pip install -r requirements.txt

📚 References

🎓 License

This project is licensed under the MIT License. See the LICENSE file for more information.

🛡️ Disclaimer

This repository is for educational purposes only. All code and resources are provided as-is, without any guarantees or warranties.

For any questions or feedback, please contact me at Kao Panboonyuen.

Name		Name	Last commit message	Last commit date
Latest commit History 179 Commits
files		files
code		code
datasets		datasets
files		files
img		img
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🍄 2110446 Data Science Course at Chulalongkorn University (2022)

📚 Weekly Labs and Exercises

Week 1: Intro to Numpy, Pandas

Week 2: Data Preparation

Week 3: Statistical Analysis

Week 4: Regression

Week 5: Traditional ML

Week 6: Introduction to Deep Learning

Week 7: Data Extraction

Week 8: Data Ingestion

Week9: Spark

Week10: Spark Streaming

Week11: Airflow

🛠 Environment Setup

📚 References

🎓 License

🛡️ Disclaimer

About

Uh oh!

Releases 1

Packages

Languages

License

kaopanboonyuen/2110446_DataScience_2021s2

Folders and files

Latest commit

History

Repository files navigation

🍄 2110446 Data Science Course at Chulalongkorn University (2022)

📚 Weekly Labs and Exercises

Week 1: Intro to Numpy, Pandas

Week 2: Data Preparation

Week 3: Statistical Analysis

Week 4: Regression

Week 5: Traditional ML

Week 6: Introduction to Deep Learning

Week 7: Data Extraction

Week 8: Data Ingestion

Week9: Spark

Week10: Spark Streaming

Week11: Airflow

🛠 Environment Setup

📚 References

🎓 License

🛡️ Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages