Skip to content

mllib.py

Vishal Tyagi edited this page Jul 9, 2023 · 1 revision

Explaination - mllib.py

The mllib.py file contains machine learning functionality used by the Flask application. It imports modules and classes from various machine learning libraries to provide a set of machine learning models and data preprocessing methods.

Importing Required Modules and Libraries

The file starts by importing the necessary modules and libraries. It includes the following imports:

  • preprocessing module from sklearn library for data preprocessing tasks.
  • train_test_split function from sklearn.model_selection module for splitting data into training and testing sets.
  • LinearRegression, DecisionTreeRegressor, RandomForestRegressor classes from sklearn.linear_model and sklearn.ensemble modules for regression models.
  • Sequential class from keras.models module for building sequential neural networks.
  • xgb module for XGBoost models.
  • pandas and numpy for data manipulation.

Available Models and Model Dictionary

The file defines a list models that contains instances of different machine learning models. These models include linear regression, decision tree regression, random forest regression, XGBoost regressor, XGBoost classifier, and a sequential neural network. These models will be used for training and making predictions.

A dictionary model_dict is also defined, which maps the names of the models to their respective instances. This dictionary is used to retrieve the selected model based on its name.

ML_lib Class

The file defines a class named ML_lib that encapsulates the machine learning functionality. This class is responsible for preprocessing data, splitting it into training and testing sets, and providing access to machine learning models.

Initialization

The ML_lib class constructor takes a dataframe and a target variable as parameters. It stores the target variable and the remaining columns in separate variables.

Data Preprocessing

The pre_processing method in the ML_lib class performs data preprocessing tasks. It uses label encoding for columns with more than 10 unique values and creates dummy variables for columns with fewer unique values. The preprocessed data is returned.

Train-Test Split

The train_test_split method splits the data into training and testing sets. It takes an is_random parameter that determines whether the split should be random or not. If is_random is False, the data is split sequentially based on a specified ratio. If is_random is True, the train_test_split function from sklearn is used to split the data randomly.

Getting the Model

The get_model function returns the machine learning model instance based on the provided model name. It retrieves the model from the model_dict dictionary.

Overall, the mllib.py file provides a set of machine learning models and functionality to preprocess data and split it into training and testing sets. These models can be used by the Flask application for training and making predictions.

Clone this wiki locally