A lot of focus was not put on learning and practicing on real world data sets. The data sets used here are very basic and intended to be used to learn different machine learning models using scikit-learn.
The aim of this repo is to create a ML portfolio of projects for learning and understanding different concepts in data science and machine learning.
- Created a model that could predict the daily revenue in dollars based on the outside air temperature (degC) using Simple linear regression.
- Developed a model to predict the impact of increasing the vehicle horsepower (HP) on fuel economy (Mileage Per Gallon (MPG)) using Simple linear regression.
- Created a model that could predict the chance of admission based of RE Score,TOEFL Score,University Rating,SOP,LOR ,CGPA,Research etc.
- Developed a model to predict the price of S&p 500 index fund based on Interest Rates,Employment etc.
- Developed a polynomial regression model to predict the manufacturing costs based on the number of units produced
- Developed a polynomial regression model to predict the salary based on the years of experience.
- Uses housing data for king county USA
- Used different models (Decision tree, Linear Regression,Ridge regression, Artifically neural networks) to predict the hosing prices based on number of bedrooms, number of bathrooms, sqft_living, sqft_lot, number of floors, sqft_above, sqft_basement etc.
- Used different models (Decision tree, Linear Regression,Ridge regression, Artifically neural networks) to predict the sale price of a car.
- Use Logistic regression to predict the chance of clicking using features like Country,Time Spent on Site,Salary etc.
- Use Logistic regression to predict the chance of survival using features like Pclass,Sex,Age,Siblings count in ship,parents count in ship,Ticket,Fare,Cabin etc.
- Used naive bayes(Binomial Naive bayes) and NLP to detect the possibility of an email being spam or not.
- Used Feature engineeering techniques of undersampling and oversampling (using SMOTE) to handle unbalanced dataset.
- Modelled and Compqred performance across Different heuristics such as Naive Bayes(Gaussian NB), Logistic Regression, Neural networks and Random Forests.
- Applied Anomaly Detection using IQR to enhance the accuracy of the model.
- Applied Model tuning using GridSearch CV for optimizing and tuning different models
- Used TSNE for Dimensionality Reduction and plotting the difference between fraud and not fraud cases.
- Apply K-means clustering with different values of K and evaluates the results using the Elbow method, Silhouette score, and visualization
- Apply K-means clustering with different values of K and evaluates the results using the Elbow method, Silhouette score, and visualization
- Modelled and Compqred performance across Different heuristics such as Naive Bayes(Gaussian NB), Logistic Regression, Neural networks and Random Forests.
- Applied Model tuning using GridSearch CV for optimizing and tuning different models.