This is an advanced web application for data preprocessing and visualization built with Flask. It allows users to upload CSV or Excel datasets, apply various preprocessing techniques, generate summary statistics, and create interactive charts for comparison between original and cleaned data.
- File Upload Support: Upload CSV and Excel (.xlsx) files (max 10MB).
- Preprocessing Options:
- Remove missing values (default).
- Remove duplicates.
- Remove outliers using IQR method for numerical columns.
- Normalize numerical columns using MinMaxScaler.
- One-hot encode categorical columns.
- Summary Statistics: Generate and display descriptive statistics for original and cleaned datasets.
- Interactive Visualizations: Select from Histogram, Box Plot, Scatter Plot, Line Chart, Bar Chart, Pie Chart. Uses Plotly for interactive charts comparing original vs. cleaned data.
- Downloads: Download original and cleaned datasets as CSV.
- User-Friendly UI: Bootstrap-based interface with custom styling.
- Clone or Download the project.
- Install Dependencies:
Required libraries: Flask, pandas, matplotlib, seaborn, plotly, openpyxl, scikit-learn.
pip install -r requirements.txt - Run the Application:
The app will start on http://127.0.0.1:5000.
python app.py - Access the App: Open your browser and go to http://127.0.0.1:5000.
- Upload Dataset: Select a CSV or Excel file.
- Select Preprocessing Options: Choose from the checkboxes (Remove Missing Values is enabled by default).
- Select Charts: Check the visualizations you want to generate.
- Submit: Click "Upload and Generate Charts".
- View Results:
- Download original or cleaned datasets.
- Review summary statistics tables.
- Interact with generated charts (before and after preprocessing).
app.py: Main Flask application with routes, preprocessing logic, and chart generation.templates/index.html: Upload form with preprocessing and chart options.templates/result.html: Results page with downloads, summaries, and charts.static/css/styles.css: Custom styles for the UI.static/charts/: Directory for chart files (auto-created).requirements.txt: Python dependencies.logo.jpg: App logo (optional).
- Charts are generated based on the first suitable columns (numerical for most, categorical for pie).
- For scatter plots, requires at least two numerical columns.
- Error handling for invalid files, empty datasets, and processing errors.
- Interactive charts require an internet connection for Plotly CDN.
To deploy the application to a cloud platform, follow these steps for Heroku (free tier available):
-
Install Heroku CLI: Download and install from https://devcenter.heroku.com/articles/heroku-cli.
-
Prepare the App for Production:
- Add
gunicorntorequirements.txt(already included). - Create a
Procfilein the root directory with the content:web: gunicorn app:app - Ensure
app.pyhas the following at the end:(Already present.)if __name__ == '__main__': app.run()
- Add
-
Deploy to Heroku:
- Login to Heroku:
heroku login - Create a new app:
heroku create your-app-name - Push the code:
git add . && git commit -m "Initial commit" && git push heroku main - Open the app:
heroku open
- Login to Heroku:
For other platforms like Render or Railway, create an account, connect your GitHub repo (push the code to GitHub first), and deploy as a web service.
- Support for more file formats (JSON, etc.).
- Advanced preprocessing (feature selection, imputation methods).
- Export charts as images/PDF.
- User authentication and session management.
© 2024 Data Cleaner. All rights reserved.