Skip to content

Feature Engineering & Data Pre-Processing - Outliers-Encoding(Label Encoding,One-Hot Encoding,Rare Encoding)-Writing The Rare Encoder-Feature Engineering (Variable Engineering)

Notifications You must be signed in to change notification settings

Barbar41/Python-Projects-Feature-Engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FEATURE ENGINEERING & DATA PRE-PROCESSING

#######################

Outliers

1-Capturing Outliers

2-Outliers with Chart Technique

3-How to Catch Outliers?

4-Are There Any Outliers?

5-Functionalize Transactions

6-Grab_col_names

7-Accessing the Outliers Themselves

8-Solving the Outlier Problem

9-Delete

10-Suppression Method (re-assignment with thresholds)

11-Recap

12-Multivariate Outlier Analysis: Local Outlier Factor

13-Missing Values

14-Capturing Missing Values

15-Solving the Missing Value Problem

####################

Solution 1: Quick deletion

####################

Solution 2: Filling with Simple Assignment Methods

1-Assigning Values in Categorical Variable Breakdown

#######################

Solution 3: Filling with Predictive Assignment

1-Recap

2-Advanced Analytics

3-Examining Missing Data Structure

4-Examining the Relationship of Missing Values with the Dependent Variable

5-Recap

#######################

2-Encoding (Label Encoding, One-Hot Encoding, Rare Encoding)

1-Label Encoding & Binary Encoding

2-One-Hot Encoding

3-Rare Encoding

#######################

Analysis comment

1-Analyzing the abundance of categorical variables.

2-Analyzing the relationship between rare categories and the dependent variable.

3-We will write rare encoder.

4-When there is a data set with plenty of categorical variables we should use Rare Analyzer and at least know

5-It is necessary to know which categorical variable's class, which frequency, which rate-dependent variable has an effect in terms of target.

####################

Analyzing the abundance of categorical variables.

####################

Analyzing the relationship between rare categories and the dependent variable.

#######################

3-Writing the rare encoder.

1-Feature Scaling

2-StandardScaler: Classic standardization. Subtract the mean, divide by the standard deviation. z = (x - u) / s

3-RobustScaler: Subtract median and divide by iqr.

4-MinMaxScaler: Variable conversion between 2 given values

5-Numeric to Categorical: Converting Numeric Variables to Categorical Variables

6-Binning

7-Feature Extraction

8-Binary Features: Flag, Bool, True-False

9-Deriving Features from Texts

10-LetterCount

11-Word count

12-Capturing Special Structures

13-Deriving Variables with Regex

14-Generating Date Variables

15-Feature Interactions

16-Titanic End-to-End Feature Engineering & Data Preprocessing

#######################

4-Feature Engineering (Variable Engineering)

1-Outliers

2- Missing Values

3-Label Encoding

4-Rare Encoding

5-One-Hot Encoding

6- Standard Scaler

7-Model

#######################

--Dependent variable Survived, independent variables Paasnger id and variables other than Surived

--We divide the data set into two: train and test.

--We will hold a model in Train, we will be testing this model that I built with the test.

--Let's bring the model object, a tree-based method

--Setting up the model x train independent variables y train target dependent variable

--Prediction of dependent variables of the test set

--Set the test set with the dependent variable y

--The score to be obtained without any action?

--How are the newly produced variables doing?

About

Feature Engineering & Data Pre-Processing - Outliers-Encoding(Label Encoding,One-Hot Encoding,Rare Encoding)-Writing The Rare Encoder-Feature Engineering (Variable Engineering)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages