This repository contains lab materials and assignments for the DAMG7245 Spring 2025 course. Each lab focuses on different aspects of big data systems, cloud computing, and modern development practices.
Lab | Topics Covered | Key Concepts |
---|---|---|
Lab 1: Git & VS Code | - Git Setup with VS Code - Team Collaboration - GitHub Project Management |
- Git credentials & VS Code extensions - Remote repositories & merge requests - Conflict resolution - GitHub Projects & Issues - Fork workflow & pull requests |
Lab 2: FastAPI & Streamlit | - FastAPI Implementation - Streamlit Development - System Architecture |
- REST API development with FastAPI - Interactive dashboards with Streamlit - API documentation with Swagger - System architecture diagrams - Multi-page Streamlit applications |
Lab 3: Cloud Deployment | - AWS S3 Integration - Pydantic Models - Google Cloud Run - Basic Docker Deployment for Cloud Run |
- Cloud service integration - Data validation with Pydantic - Container deployment - Cloud infrastructure setup - Environment configuration - CI/CD practices |
Lab 4: DBT & Airflow | - Data Build Tool (DBT) - Airflow Integration - Data Pipeline Development |
- DBT project structure - Data transformation - Pipeline orchestration - Data modeling - Testing frameworks - Workflow automation |
Lab 6: CI/CD with Snowflake | - Data Masking UDF Development - Snowpark Python Integration - GitHub Actions CI/CD Pipeline |
- Snowflake UDF development - Data masking and PII handling - CI/CD pipeline setup - Environment configuration - Automated testing - Production deployment |
Lab 7: RAG Systems | - Embedding Similarity - Chunking Strategies - RAG Implementation - LLM Integration |
- Text embeddings & similarity metrics - Document chunking strategies - Vector storage with ChromaDB - Custom RAG implementation - LLM integration - System optimization |