This project combines Exploratory Data Analysis (EDA) with Neural Network-based classification to extract insights and build predictive models from a dataset. It involves feature engineering, PCA-based dimensionality reduction, and training deep learning models using TensorFlow/Keras.
project.ipynb # Main Jupyter Notebook with EDA and Deep Learning workflow README.md # Project overview and documentation
- Skewness Analysis:
|Skewness| < 1
: Approximately symmetric|Skewness| > 1
: Highly skewed1 < |Skewness| < 2
: Moderately skewed
- Statistical Summaries
- Outlier Detection
- Visualizations:
- Histograms, Box Plots, Pair Plots
- Feature Scaling using
StandardScaler
- Dimensionality Reduction using PCA
- Libraries:
TensorFlow
/Keras
- Model Architecture:
- Feedforward Neural Networks (Multilayer Perceptron)
- Input: Scaled and PCA-reduced features
- Output: One-hot encoded target variable
- Training Details:
- Loss function: Categorical Crossentropy
- Optimizer: Adam
- Epochs: 100
- Early stopping & checkpoint callbacks used
- Evaluation Metrics:
- Accuracy Score
- Confusion Matrix
- Classification Report (Precision, Recall, F1)
- Training Accuracy: β (Check notebook for exact value)
- Validation Accuracy: β
(Plotted using
history
) - Confusion Matrix: βοΈ Reveals class-wise performance
- PCA Explained Variance: βοΈ Helps understand dimensionality contribution
- Python 3.x
- Jupyter Notebook
pandas
,numpy
,matplotlib
,seaborn
scikit-learn
tensorflow
/keras
-
Clone the Repository
git clone https://github.com/yourusername/eda-deeplearning-project.git cd eda-deeplearning-project
-
Set Up Virtual Environment
python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate pip install -r requirements.txt
-
Run the Notebook
jupyter notebook project.ipynb
- Insights on data distribution and skewness
- PCA-based dimensionality reduction
- Trained deep learning model with high accuracy
- Clear evaluation via confusion matrix and classification report
- Ensure the dataset is placed correctly or loaded within the notebook.
- You can tune the model architecture, learning rate, or add regularization for improvement.