-
Notifications
You must be signed in to change notification settings - Fork 1
mllib.py
The mllib.py
file contains machine learning functionality used by the Flask application. It imports modules and classes from various machine learning libraries to provide a set of machine learning models and data preprocessing methods.
The file starts by importing the necessary modules and libraries. It includes the following imports:
-
preprocessing
module fromsklearn
library for data preprocessing tasks. -
train_test_split
function fromsklearn.model_selection
module for splitting data into training and testing sets. -
LinearRegression
,DecisionTreeRegressor
,RandomForestRegressor
classes fromsklearn.linear_model
andsklearn.ensemble
modules for regression models. -
Sequential
class fromkeras.models
module for building sequential neural networks. -
xgb
module for XGBoost models. -
pandas
andnumpy
for data manipulation.
The file defines a list models
that contains instances of different machine learning models. These models include linear regression, decision tree regression, random forest regression, XGBoost regressor, XGBoost classifier, and a sequential neural network. These models will be used for training and making predictions.
A dictionary model_dict
is also defined, which maps the names of the models to their respective instances. This dictionary is used to retrieve the selected model based on its name.
The file defines a class named ML_lib
that encapsulates the machine learning functionality. This class is responsible for preprocessing data, splitting it into training and testing sets, and providing access to machine learning models.
The ML_lib
class constructor takes a dataframe and a target variable as parameters. It stores the target variable and the remaining columns in separate variables.
The pre_processing
method in the ML_lib
class performs data preprocessing tasks. It uses label encoding for columns with more than 10 unique values and creates dummy variables for columns with fewer unique values. The preprocessed data is returned.
The train_test_split
method splits the data into training and testing sets. It takes an is_random
parameter that determines whether the split should be random or not. If is_random
is False, the data is split sequentially based on a specified ratio. If is_random
is True, the train_test_split
function from sklearn
is used to split the data randomly.
The get_model
function returns the machine learning model instance based on the provided model name. It retrieves the model from the model_dict
dictionary.
Overall, the mllib.py
file provides a set of machine learning models and functionality to preprocess data and split it into training and testing sets. These models can be used by the Flask application for training and making predictions.