-Master of science, Statistics, Columbia University, class of 2024 -Bacholar of Science, Statistics, Michigan State University, class of 2022
Large Language Models (LLMs): Pretraining, distributed GPU optimization, and alignment techniques such as SFT and RLHF. AI for Business Applications: Leveraging AI in real estate, energy markets, and advertising systems to optimize decision-making and improve efficiency. Machine Learning: Applied data analysis, financial modeling, and algorithmic trading for market research and forecasting. Natural Language Processing (NLP): Multi-modal models, sentiment analysis, and user behavior modeling for e-commerce and social media platforms. AI System Optimization: High-performance training, model compression, and efficient inference techniques for scalable AI applications.
Welcome to my GitHub! I’m a Master’s student in Statistics at Columbia University with a passion for using data to solve real-world problems across diverse industries. My experience spans data analysis, statistical modeling, and product testing.
- Programming Languages: Python, R, SQL
- Statistical & Machine Learning Techniques: Regression, KNN, CNN, Time Series Analysis
- Data Visualization & Analysis: Matplotlib, Pandas, Tableau
-
House Price Prediction
Improved the baseline linear regression model on Kaggle's house price competition, reducing the mean squared error and ranking in the top 15%. -
[Image Classification with CNN](https://github.com/YonghaoXu/cnn-image-classification
Developed a robust CNN model to classify images with label noise, achieving a 92.5% accuracy rate. -
[Natural Disaster Relief Fund Analysis](https://github.com/YonghaoXu/fema-disaster-analysis
Visualized FEMA data for policymaker insights into disaster relief fund allocation.
Feel free to explore my repositories and connect with me on LinkedIn!