A comprehensive collection of machine learning, deep learning, natural language processing, and data analytics projects developed during the MSc in Business Analytics and Data Science program at IE School of Science and Technology (February 2024 - July 2025).
This repository showcases a diverse range of data science projects covering the full spectrum of modern AI and machine learning techniques. From classical statistical methods to cutting-edge deep learning architectures, each project demonstrates practical applications and hands-on implementation of key concepts.
- Programming Languages: Python
- Deep Learning: TensorFlow, Keras
- Machine Learning: Scikit-learn, NumPy, Pandas
- Data Visualization: Matplotlib, Seaborn
- NLP: NLTK, HuggingFace Transformers, ELMo
- Time Series: Statsmodels, ARIMA, SARIMAX
- Development Environment: Jupyter Notebooks, Google Colab
Advanced neural network implementations and computer vision projects
- 📊 MNIST MLP Classification - Multi-layer perceptron for handwritten digit recognition
- 🔍 CNN CIFAR-10 Baseline - Convolutional neural networks for image classification
- 🐱 CNN Cats & Dogs Classifier - Binary image classification with transfer learning
- 🌸 CNN Flower Classification - Multi-class flower recognition using EfficientNet
- 🔄 Transfer Learning Applications - Leveraging pre-trained models for custom tasks
Key Achievements:
- MNIST accuracy: ~89% with optimized MLP architecture
- Implementation of both baseline and advanced CNN models
- Successful application of transfer learning techniques
Classical ML algorithms and statistical modeling approaches
- Startup classification using decision tree algorithms
- Feature importance analysis and tree visualization
- Performance evaluation and model interpretation
- Implementation of KNN for classification tasks
- Distance metric optimization
- Cross-validation and hyperparameter tuning
- Probabilistic classification models
- Text classification applications
- Performance comparison with other algorithms
- Ensemble learning implementation
- Boston housing price prediction
- Feature selection and importance ranking
- Market basket analysis using Apriori algorithm
- Rule mining and confidence metrics
- Retail dataset analysis for customer insights
- PCA and t-SNE implementations
- Visualization of high-dimensional data
- Component analysis and variance explanation
Text analysis and language understanding projects
- Sentiment analysis on various datasets
- Feature extraction using TF-IDF and bag-of-words
- Advanced transformer models (BERT, etc.)
- Multi-class text categorization
- Part-of-speech tagging implementation
- Named entity recognition systems
- Sequence labeling with neural networks
- Document search and ranking systems
- TF-IDF based retrieval models
- Query processing and relevance scoring
- BERT-based QA systems
- Reading comprehension tasks
- Context-aware answer extraction
- Contextual word embeddings
- Language modeling applications
- Deep bidirectional representations
Temporal data modeling and forecasting
- 📊 ARIMA Modeling - Autoregressive integrated moving average models
- 🔄 SARIMAX Implementation - Seasonal ARIMA with external variables
- 📉 Trend Analysis - Decomposition and pattern identification
- 🎯 Forecasting - Multi-step ahead predictions
- 📋 Statistical Testing - Stationarity tests and diagnostics
Datasets Analyzed:
- Economic indicators and financial data
- Seasonal business metrics
- Real-world time series with multiple variables
Customer behavior analysis and business intelligence
- 🎯 Customer Segmentation - RFM analysis and clustering
- 📊 Campaign Performance - Multi-channel advertising effectiveness
- 💰 ROI Analysis - Return on investment calculations
- 📱 Social Media Analytics - Engagement and conversion metrics
- 🛒 Purchase Behavior - Customer journey analysis
Key Insights:
- Conversion rate optimization across platforms
- Customer lifetime value modeling
- A/B testing and statistical significance
Big data processing and system design
✈️ Flight Delay Prediction - Predictive modeling for aviation industry- 🌤️ Weather Impact Analysis - Multi-source data integration
- 🤖 Genetic Algorithm Optimization - Evolutionary computing applications
- 📊 Data Pipeline Design - ETL processes and data warehousing
- Clone the repository:
git clone https://github.com/your-username/IE-data-science-portfolio.git
cd IE-data-science-portfolio
- Create virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Launch Jupyter:
jupyter notebook
Project Category | Best Model | Accuracy/Performance | Dataset Size |
---|---|---|---|
MNIST Classification | MLP | 89.09% | 70,000 images |
CIFAR-10 | CNN | ~85% | 60,000 images |
Text Classification | BERT | ~92% | Variable |
Time Series Forecasting | SARIMAX | MAPE < 10% | 2000+ points |
Customer Segmentation | K-Means | Silhouette > 0.6 | 10,000+ records |
Each project follows a structured approach:
- 📋 Data Exploration - Comprehensive EDA with statistical analysis
- 🧹 Data Preprocessing - Cleaning, normalization, and feature engineering
- 🏗️ Model Development - Implementation and experimentation
- ⚡ Optimization - Hyperparameter tuning and validation
- 📈 Evaluation - Performance metrics and model interpretation
- 📝 Documentation - Clear explanations and visualizations
- Deep Learning Mastery: From basic MLPs to advanced CNNs and transfer learning
- Classical ML Expertise: Comprehensive understanding of traditional algorithms
- NLP Proficiency: Modern text processing and language understanding
- Time Series Analysis: Forecasting and temporal pattern recognition
- Business Intelligence: Data-driven decision making and analytics
- Software Engineering: Clean, reproducible, and well-documented code
- Python 3.8+
- Basic understanding of statistics and linear algebra
- Familiarity with machine learning concepts
- Experience with Jupyter notebooks
This repository represents academic coursework and personal learning. For suggestions or improvements:
- Fork the repository
- Create a feature branch
- Submit a pull request with detailed description
This project is for educational purposes. Please respect the academic integrity policies when using this code.
- IE School of Science and Technology - MSc in Business Analytics and Data Science program (February 2024 - July 2025)
- Teaching Faculty - Expert instruction and mentorship throughout the program
- Fellow Students & Teammates - Collaborative learning and group project contributions
- Open Source Community - Tools and libraries that made this possible
For questions or collaboration opportunities:
- 📧 Email: ulisesgordillot@gmail.com
- 💼 LinkedIn: ulisesgordillo
- 🐙 GitHub: uligt
⭐ If you found this repository helpful, please consider giving it a star!
Last Updated: July 2025