Welcome to the Data Science Course (2110446) at Chulalongkorn University! This repository contains the code, exercises, and resources to guide you through the fascinating world of data science, deep learning, and more. Each week, you'll dive into new concepts, reinforced with hands-on labs, to help you build a strong foundation in data science.
-
Pandas with Youtube stat data: Analyzing YouTube video statistics with Pandas.
-
(Advanced) Pandas with Youtube stat data: Advanced analysis of YouTube video statistics.
-
Assignment (Pandas with Youtube stat data): Hands-on assignment analyzing YouTube video statistics.
-
Impute Missing Value: Handling missing data in loan datasets.
-
Split Train/Test: Splitting datasets into training and testing sets.
-
Outliers with Log: Identifying and handling outliers using logarithmic transformation.
-
Outliers with Log (Titanic DataSet): Advanced outlier analysis using the Titanic dataset.
-
Intermediate Stat: Intermediate statistical techniques and their applications.
-
Assignment (Stat): Statistical analysis homework assignment.
-
Decision Trees: Implementing decision trees and random forests.
-
Linear Regression: Applying linear regression to predict outcomes.
-
Logistic Regression: Building and evaluating logistic regression models.
-
Neural Network: Introduction to basic neural networks for classification.
-
Image Classification (Basic): Classify flowers using basic image classification techniques.
-
Image Classification (Advanced): Explore advanced flower classification with EfficientNet and pretrained weights.
-
Semantic Segmentation (UNET): Segment images of pets using the Oxford-IIIT pet dataset and UNET.
-
LSTM for Stock Price Prediction: Predict stock prices with Long Short-Term Memory (LSTM) networks.
-
SARIMAX for PM2.5 Forecasting: Forecast PM2.5 levels using the SARIMAX model.
Assignment: Fashion MNIST Classification
-
Basic Web Scraping: Learn the basics of web scraping to extract data from web pages.
-
Wikipedia Data Extraction: Extract structured data from Wikipedia pages.
-
REST API Data Extraction: Interact with APIs to gather data programmatically.
-
Twitter Data Extraction: Collect data from Twitter using its API.
-
Web Automation with Selenium: Automate web browsing and data collection using Selenium.
Assignment: Web Scraping Assignment
All relevant code and scripts are available here.
-
Kafka Sample Producer: Learn how to produce data streams with Kafka.
-
Kafka Sample Consumer: Consume data streams from Kafka topics.
-
Kafka with AVRO: Handle data streams with AVRO serialization.
- Producer: Open In GitHub
- Consumer: Open In GitHub
-
Sensor Data Ingestion: Work with sensor data streams.
- FileWriter Consumer: Open In GitHub
- Counter Consumer: Open In GitHub
Download all source code here (week8_dataingestion.zip).
Data Set:
- Basic Spark Streaming:
- Spark Streaming Window Operations:
- Basic Structured Streaming:
- Structured Streaming Window Operations:
- Structured Streaming and Kafka:
Data Set:
All code is here: this link
Anyway, you can download all source codes for week11_airflow_and_fastapi through this link (week11_airflow_and_fastapi.zip).
** Updated python codes/notebooks will be posted here shortly before each lecture.
The code in this repository is designed to run in Google Colab or a local Python environment. To get started locally, ensure you have Python 3.8+ installed and use the following steps to set up your environment:
git clone https://github.com/kaopanboonyuen/2110446_DataScience_2021s2.git
cd 2110446_DataScience_2021s2
pip install -r requirements.txt
- https://www.kaggle.com/code
- https://www.tensorflow.org/tutorials
- https://github.com/topics/machine-learning
- https://archive.ics.uci.edu/ml/datasets.php
- https://colab.research.google.com/notebooks/
This project is licensed under the MIT License. See the LICENSE file for more information.
This repository is for educational purposes only. All code and resources are provided as-is, without any guarantees or warranties.
For any questions or feedback, please contact me at Kao Panboonyuen.