Machine Learning Repository for High-Performing Classification and Regression Models

Welcome to my GitHub repository dedicated to developing state-of-the-art supervised machine learning algorithms for classification and regression tasks. My primary focus is on achieving excellent scores in Kaggle.com competitions, where I strive to push the boundaries of model performance.

Join me on this exciting journey of developing high-performing classification and regression models. Let's unlock the full potential of supervised learning and make our mark in Kaggle.com competitions. Happy coding, and may our models achieve excellence!

Classification

Titanic - A classification problem on this dataset. I have implemented some best practices, developing three models of increasing complexity.

Skills: Data wrangling and cleaning | Feature engineering | Model training and evaluation | Hyperparameter tuning | Ensemble learning.

Accomplishments: Developed custom functions to create new features based on domain knowledge. | Utilized IterativeImputer and KNNImputer to handle missing values. | Tuned hyperparameters using GridSearchCV. | Enhanced my functions into transformers using FunctionTransformer. | Employed SimpleImpute with different strategies for numeric and categorical features. | Improved my custom transformers, incorporating the Cabin feature.

Basic - wrote my own functions to create new features and code them using domain knowledge. After having all the features coded, I use IterativeImputer and KNNImputer. I tried different parameters for imputation_order and n_neighbors. I created a main Pipeline and a GridSearchCV to tune hyper-parameters.
02 - I improved and turned my functions into transformers using FunctionTransformer which I then used in FeatureUnion . This time I wanted to try SimpleImpute and the strategies of mean, median, constant, most_frequent for numeric features and most_frequent for categorical features.
03 - I improved my own transformers. This time, I included and coded the Cabin feature, which I didn't use in the previous 2 procedures. I use IterativeImputer and KNNImputer. I tried different parameters for imputation_order and n_neighbors .

Regression

House Prices - A regression problem.

Final Version

General Info
- Every step in the process is a function, (e.g. Imputation, Reducing Cardinality).
- Create custom transformers, so I can impute nan values in a Pipeline.
- I preserved columns name, when I encode Ordinal and Nominal variables, by using the category_encoders library

Outliers:
- I remove them by Domain knowledge,
- Applying IsolationForest
- Analyzing Residuals (e.g. > than 3 standard deviation)

Categorical features:
- To reduce the number of dimensions that OneHotEncoder will produce:
- I collapse the less frequent categories into a single category ‘Others’ or
- I create clusters using KMeans algorithm from scikit-learn
- To encode an Ordinal variable, I created a dictionary that maps each category to the corresponding order. I then pass it to the OrdinalEncoder instance.
- To encode Nominal variables, I used OneHotEncoder from category_encoders
- Ordinal Cyclical variables: I calculated the SIN and COS components of them.

Numerical Features:
- I treated Skewness(> 0.7) of numerical variables with Yeo-Johnson technique in the PowerTransformer to make them more Gaussian Like.
- Scaling Variables using RobustScaler, because of outliers.

Feature Selection:
- By using mutual_info_regressio model
- Drop features with mutual info = 0.

Feature Engineering:
- By domain knowledge (e.g combine different measurements of the land)
- Create Polynomial interaction features from the top 20 features with highest mutual information

Feature Importance:
- I used the feature importance from XGBoost in SelectFromModel to filter out most important feature

Modeling:
I tune the hyperparameters manually or by GridSearchCV, RandomizedSearchCV for the following algorithms.
- TheilSenRegressor, HuberRegressor, RANSACRegressor, Ridge, ElasticNet, ElasticNetCV, LinearRegression, XGBRegressor, catboost.CatBoostRegressor
- Next I used them in a StackingRegressor and a LinearRegression as a Meta model.

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
CSV/HousePrices		CSV/HousePrices
icons		icons
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Machine Learning Repository for High-Performing Classification and Regression Models

Classification

Regression

About

Uh oh!

Releases

Packages

Languages

License

antirrabia/Machine-Learning

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Repository for High-Performing Classification and Regression Models

Classification

Regression

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages