This project is a utility of functions that assist in database EDA, and is constantly being improved.
EDA Utilities is a reusable and scalable Python module designed to streamline Exploratory Data Analysis (EDA) across various domains. This toolkit simplifies data loading, cleaning, visualization, and statistical analysis, making the initial phases of any data science project more efficient and structured.
With built-in logging and custom exception handling, the module ensures reliable debugging and error tracking, enabling seamless data exploration for datasets in finance, healthcare, marketing, and more.
This project leverages Python and key data science libraries. The technologies and tools used were Python (Pandas, Numpy, Scipy, Matplotlib, Seaborn and Plotly), Jupyter Notebook, Git and Github (version control), statistics, code runner (terminal), and Visual Studio Code (project development environment).
This EDA framework was built to accelerate the data exploration phase while maintaining data integrity and insights generation.
- Automated Data Loading: Easily handle .csv and .xlsx files
- Data Cleaning & Transformation: Handle missing values, convert dates, and format numerical values
- Statistical Insights: Generate descriptive statistics, detect outliers, and compute variability metrics
- Powerful Visualizations: Create histograms, boxplots, and bar charts with minimal effort
- Error Handling & Logging: Ensure smooth debugging and structured exception reporting
With this package, data scientists can focus on insights rather than preprocessing.
Click here to see an application of eda_utilis.
This project is continuously evolving. The following features are planned for future versions:
- Data Type Detection & Auto-Cleaning - Automate the identification and handling of categorical, numerical, and datetime features.
- Outlier Handling Options - Expand strategies beyond IQR, such as Z-score and Isolation Forests.
- Python (3.13.0)
- pip (25.0.1)
- Git (version control tool)
Once you have these installed, open a terminal on your local machine and run the following commands:
-
Clone the repository:
git clone https://github.com/Haniel-G/EDA_Utilis.git
-
Navigate to the cloned repository directory:
cd EDA_Utilis
-
Create a virtual environment:
python -m venv nome_da_venv
-
Activate the virtual environment:
source .venv/Scripts/activate # On Linux, use 'venv/bin/activate'
-
Install dependencies:
pip install -r requirements.txt
To integrate EDA Utilis into your existing project and ensure all dependencies are installed correctly, follow these steps:
-
Clone the repository (if not already cloned):
git clone https://github.com/Haniel-G/EDA_Utilis.git
-
Navigate to the cloned repository directory:
cd EDA_Utilis
-
Merge dependencies with your existing project:
If you already have arequirements.txt
file in your project, append the dependencies from EDA Utilis without removing your existing ones:cat requirements.txt >> ../requirements.txt
Then, remove duplicate entries to avoid conflicts:
sort -u ../requirements.txt -o ../requirements.txt
-
Install the updated dependencies:
pip install -r ../requirements.txt
-
Import and use EDA Utilis in your project:
After installation, you can import and utilize its functions in your Python scripts or Jupyter notebooks:from src.eda_utilis import *