Skip to content
View seankim0's full-sized avatar

Block or report seankim0

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
seankim0/README.md

Sean Kim

Lead AI Scientist | Machine Learning Modeling | Pricing & Risk & Marketing & Operation & Servicing Strategy | AI/ML Engineering

GitHub
LinkedIn
TechBlog


๐Ÿ”น About Me

I am a Lead AI Scientist with 19+ years of experience in AI-powered recommendation engines, LLM/RAG, pricing optimization, credit scoring, loss forecasting, fraud detection, and building large-scale ML pipelines. My expertise lies in predictive modeling, model validation, and AI-driven business analytics for fintech, banking, payment, e-commerce, real estate, investment, and platform industries. Proven leadership 12+ years in leading cross-functional teams in implementing enterprise-wide ML solutions, managed and mentored data scientists, and spearheaded strategic data initiatives resulting in significant operational improvements.

Disclaimer

This is my personal portfolio site. The sample projects here are prototypes, MVPs, and toy projects created to demonstrate my skill set during my masterโ€™s program and in my spare time. They are not work-related and contain only my own proprietary work. Data sources are primarily from Kaggle, arXiv, and various open-source APIs.

If you have any questions or are interested in collaboration, feel free to reach out.

๐Ÿ”น Generative AI with MLOps, LLMOps, Data Pipeline

  • LLM RAG based AI ChatBot: A powerful AI ChatBot App that lets you chat with multiple documents using LLM and RAG.
  • MLOps using Docker & Kubernetes: End-to-end ML pipeline, MLflow, CI/CD implementation using Docker containers and Kubernetes orchestration for scalable model deployment.
  • House Price Prediction using Deep Learning PyTorch and ML Pipeline: A deep learning model built with PyTorch to predict house prices based on real estate features.
  • ML Pipeline for On-premise and AWS cloud: A complete on-premise/AWS cloud machine learning pipeline setup for training, validation, and deployment with local servers, AWS EC2/S3.
  • Data Pipeline using Airflow: End-to-End Data Pipeline using Spark, S3, Databricks, and Airflow: Word Count, User Behavior Analysis, UDF-based Segmentation, and Daily Workflow Automation.
  • Real-time Crypto Price Analytics Platfor: Built a real-time BTC/ETH price tracking system using Kafka, Spark Streaming, Redshift, Airflow, and Stremlit, featuring live dashboards, Slack alerts, and volatility trend analysis.

๐Ÿ”น Finance & Operation Modeling

  • Credit Risk Scoring: Creating risk-based underwriting scorecards leveraging logistic regression, XGBoost, and deep learning models.
  • Loss Forecasting: Developing robust models to predict credit losses (PD/LGD/EAD) using macroeconomic factors, delinquency trends, and time-series analysis.
  • Fraud Detection: Implementing anomaly detection models using unsupervised learning (Isolation Forest, Autoencoders) and supervised learning (CatBoost, XGBoost).
  • Residual Value Modeling: Forecasting vehicle residual values using econometric models like PROC MIXED and machine learning approaches.
  • Pricing Optimization: Optimizing pricing strategies using reinforcement learning, JD Power competitive rate analysis, and market elasticity modeling.
  • Interest Rate Forecasting: Predicting interest rate trends through time-series forecasting (ARIMA, LSTM) and economic indicators.
  • Behavioral Score (Collection Model): Developing early-stage delinquency prediction models for efficient collection strategies.

๐Ÿ”น Marketing & CRM Analytics

  • Churn Prediction (Retention): Building ML models to identify customers at risk of attrition, utilizing survival analysis and XGBoost.
  • Propensity (Return to Market) Model: Developing models to predict customer likelihood of returning to market based on behavioral signals and transaction data.
  • Customer Satisfaction (Sentiment Score) Model: Using NLP techniques (BERT, LLMs) to extract sentiment insights from customer reviews, survey responses, and service interactions.
  • Marketing Mix Modeling (MMM): Analyzing the effectiveness of various marketing channels through econometric regression and Bayesian inference.

๐Ÿ”น Tech Stack & Tools

  • Languages: Python (Pandas, NumPy, Scikit-learn, PyTorch, TensorFlow), SQL (BigQuery, Snowflake, Redshift, MySQL, PostgreSQL), R, SAS, Java, Scala, NoSQL (Mongo, Couchbase), React, Node/Next.js
  • Big Data & Cloud: Databricks, DBT, Apache Spark (PySpark, Spark ML, MLlib), Apache Beam/Flink, AWS (S3, Lambda, EC2), Azure, GCP (BigQuery, Vertex AI), OCI, Airflow DAG orchestration
  • Machine Learning & Framework: XGBoost, LightGBM, CatBoost, Logistic Regression, Random Forest, Neural Networks (Transformer, LSTM), H20, AutoML, TensorFlow, PyTorch, Scikit-lear, Generative AI-LLM (Langchain, GANs, VAEs, Langroid), NLP (NLTK, Vader), Unsupervised learning (Isolation Forest, Autoencoders)
  • Visualization & Reporting: Tableau, Power BI, Looker, D3.js, Matplotlib, Seaborn, Databricks Quick insights, AWS QuickSight
  • MLOps & Deployment: Docker, Kubernetes, MLflow, Spark + Airflow, GitHub Actions, API Deployment (FastAPI, Flask), Git, Bitbucket, AWS S3, scalable DAG setup
  • Data Warehousing & Data Lake: Oracle Object Storage, Apache Hadoop, Apache Hive, Amazon Redshift, Snowflake
  • Pipeline & Automation: Apache Airflow (SparkSubmitOperator), daily batch ETL scheduling, parameterized DAGs, S3 integration
  • Extensible Architecture: DBT / Snowflake integration, containerization (Docker), and CI/CD (GitHub Actions) ready with minimal refactor

๐Ÿ”น Featured Prototypes

๐Ÿ”น Notable Achievements

  • Developed a fraud anomaly detection model that reduced fraudulent transactions by 25%.
  • Designed a credit scorecard model that improved underwriting efficiency, leading to a $15M ROA increase.
  • Led the residual value modeling project, accurately forecasting used car values and optimizing lease pricing.
  • Built an AI-powered customer sentiment model, reducing customer complaints by 18%.
  • Delivered an interest rate optimization framework, enhancing portfolio returns with predictive modeling.

๐Ÿ”น Accomplishments

๐Ÿ”น Current Focus

  • Researching LLMs and GenAI applications in risk & marketing modeling.
  • Exploring causal inference techniques for better decision-making in financial products.
  • Implementing automated MLOps pipelines for scalable credit risk modeling.

๐Ÿ”น Contact


Let's connect! If you're interested in innovative AI applications and ML modeling for measurable benefits, feel free to reach out!

Pinned Loading

  1. credit_risk_scoring credit_risk_scoring Public

    A prototype credit risk scoring model benchmarking various algorithms for risk assessment and evaluation.

    Jupyter Notebook

  2. churn_prediction churn_prediction Public

    Machine learning model for predicting customer churn using classification algorithms, feature engineering, and model interpretability techniques.

    Jupyter Notebook

  3. fraud_detection fraud_detection Public

    Jupyter Notebook

  4. house_price_prediction house_price_prediction Public

    Machine learning-based house price prediction model using real estate data for accurate valuation and trend analysis.

  5. langchain_llm langchain_llm Public

    Jupyter Notebook

  6. recommender_algorithm recommender_algorithm Public

    A versatile recommender system implementing various recommendation algorithms. ๐Ÿš€