Preprocessing of missing data within the Wiki4HE dataset to enhance data quality and enable effective analysis. This project includes handling missing values using various techniques to ensure accurate and reliable outcomes for further analysis.
Handling missing data is a crucial step in data preprocessing. This repository focuses on:
- Identifying missing data in the Wiki4HE dataset.
- Applying various imputation techniques to handle missing values effectively.
- Improving data quality for downstream tasks like analysis and model training.
- Support for multiple imputation techniques, including:
- Mean/Median/Mode Imputation
- K-Nearest Neighbors (KNN) Imputation
- Advanced techniques like Multiple Imputation by Chained Equations (MICE)
- Visualization of missing data patterns.
- Evaluation metrics for assessing the quality of imputations.
Ensure that you have the following installed:
- Python (>= 3.7)
- Jupyter Notebook
-
Clone the repository:
git clone https://github.com/eyabesbes/Traitement-donnees-manquantes.git cd Traitement-donnees-manquantes
-
Install the required dependencies:
pip install -r requirements.txt
-
Open the Jupyter Notebook:
jupyter notebook
-
Navigate to the notebook files provided in the repository and execute the cells step-by-step to:
- Load the Wiki4HE dataset.
- Visualize missing data patterns.
- Apply various imputation techniques.
- Evaluate the quality of the imputed data.