Skip to content

Preprocessing of missing data within the Wiki4HE dataset to enhance data quality and enable effective analysis. This includes handling missing values through various techniques.

License

Notifications You must be signed in to change notification settings

eyabesbes/Traitement-donnees-manquantes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Traitement des Données Manquantes

Preprocessing of missing data within the Wiki4HE dataset to enhance data quality and enable effective analysis. This project includes handling missing values using various techniques to ensure accurate and reliable outcomes for further analysis.


Table of Contents


Overview

Handling missing data is a crucial step in data preprocessing. This repository focuses on:

  • Identifying missing data in the Wiki4HE dataset.
  • Applying various imputation techniques to handle missing values effectively.
  • Improving data quality for downstream tasks like analysis and model training.

Features

  • Support for multiple imputation techniques, including:
    • Mean/Median/Mode Imputation
    • K-Nearest Neighbors (KNN) Imputation
    • Advanced techniques like Multiple Imputation by Chained Equations (MICE)
  • Visualization of missing data patterns.
  • Evaluation metrics for assessing the quality of imputations.

Setup Instructions

Prerequisites

Ensure that you have the following installed:

  • Python (>= 3.7)
  • Jupyter Notebook

Installation

  1. Clone the repository:

    git clone https://github.com/eyabesbes/Traitement-donnees-manquantes.git
    cd Traitement-donnees-manquantes
  2. Install the required dependencies:

    pip install -r requirements.txt

Usage

  1. Open the Jupyter Notebook:

    jupyter notebook
  2. Navigate to the notebook files provided in the repository and execute the cells step-by-step to:

    • Load the Wiki4HE dataset.
    • Visualize missing data patterns.
    • Apply various imputation techniques.
    • Evaluate the quality of the imputed data.

About

Preprocessing of missing data within the Wiki4HE dataset to enhance data quality and enable effective analysis. This includes handling missing values through various techniques.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published