Welcome to my GitHub profile!
I'm Cao Gia Thinh, a final-year Computer Science student with a deep focus on Data Engineering. I am passionate about designing and building scalable, high-performance data systems that transform raw data into valuable insights to support business decision-making.
These are my flagship projects that showcase my skills and experience.
Built a complete data platform on Google Cloud to collect, process, and analyze retail data from various sources.
- Orchestration: Leveraged Kestra (deployed on Cloud Composer) to schedule and orchestrate data ingestion pipelines from parquet files.
- Data Lake & Warehouse: Stored raw data in Google Cloud Storage (GCS). Subsequently, cleaned, transformed, and loaded the data into Google BigQuery using Apache Spark.
- Data Modeling: Implemented a Star Schema within BigQuery to optimize for analytical queries.
- Deployment: Containerized the entire application and its dependencies using Docker to ensure consistency across environments.
Technologies: GCP (BigQuery, GCS, Composer)
, Kestra
, Apache Spark
, Docker
, Python
, SQL
, dbt
, Google Data Studio
.
Designed and implemented a modern data warehouse to empower Sales and Marketing teams with advanced analytics.
- ETL & Transformation: Using SQL to extract, transform, and load from source to destination data warehouse.
- Data Warehouse Design: Architected a DWH schema on Microsoft SQL Server.
Technologies: T-SQL
, MS SQL SERVER
.
I'm always open to discussing new opportunities, interesting projects, or anything related to data and technology. Feel free to reach out!
****