My name is Stefano Caramagno, and I'm pleased to present this repository containing a project on creation of a integrated ML pipeline for vehicle pricing.
This project was completed as part of the Machine Learning course during my Master's Degree in Computer Science and Engineering at the University of Catania.
- Data Loading: Performs structured loading of the dataset, inspects variable types, and analyzes initial distributions to understand data composition and structure.
- Data Preprocessing: Cleans and prepares the dataset by handling missing values, encoding categorical variables, and normalizing features to ensure model compatibility.
- Supervised Learning Models: Implements a broad range of regression algorithms to predict car prices, covering both linear and non-linear supervised techniques.
- Hyperparameter Optimization: Applies hyperparameter tuning techniques to identify optimal configurations and improve the generalization of supervised models.
- Supervised Model Evaluation: Evaluates and compares regression models using performance metrics and validation techniques to measure prediction effectiveness.
- Unsupervised Learning Techniques: Applies clustering algorithms to group data points based on hidden patterns, aiming to discover natural structures in the dataset.
- Unsupervised Model Evaluation: Evaluates and compares clustering models results using general-purpose techniques and visualizations to assess quality of groupings.
- Semi-Supervised Learning Models: Implements models that leverage both labeled and unlabeled data to replicate real-world scenarios with incomplete supervision.
- Semi-Supervised Model Evaluation: Evaluates and compares semi-supervised models by analyzing learning effectiveness and performance across different label distributions.
- End-to-End ML Pipeline: Organizes the entire machine learning process from data loading to prediction and evaluation, ensuring workflow reproducibility and clarity.
- Result Documentation: Summarizes findings through structured markdown cells, explaining key insights and guiding interpretation of each analytical stage.
- Programming Language: Python for implementing data preprocessing and machine learning models.
- Relevant Libraries:
- NumPy for efficient numerical operations and data manipulation.
- Pandas for data loading, preprocessing, and tabular data handling.
- Matplotlib for static data visualizations and model output representation.
- Seaborn for enhanced statistical plots and correlation analysis.
- Scikit-learn for machine learning algorithms and evaluation techniques.
- Dependency Management: Pip for installing and managing project dependencies.
- IDE: Visual Studio Code for development and debugging.
- Relevant Extensions:
- Jupyter for notebook-based development and interactive execution within VS Code.
- Version Control: Git for tracking changes and managing project versions.
- Repository Hosting: GitHub for storing, sharing, and maintaining the project repository.
Ensure you have the following tools installed on your system before proceeding:
- Python: Version 3.9 or later, required to run the script.
- Required Libraries: Install the following libraries using
pip
from the terminal:- NumPy: Required for efficient numerical operations and data manipulation.
- Pandas : Required for data loading, preprocessing, and tabular data handling.
- Matplotlib : Required for static data visualizations and model output representation.
- Seaborn : Required for enhanced statistical plots and correlation analysis.
- Scikit-learn : Required for machine learning algorithms and evaluation techniques.
- Pip: Used to install required dependencies.
- IDE: Required to read and understand code efficiently.
- Relevant Extensions:
- Jupyter : Required for notebook-based interactive execution within VS Code.
- Git: Used to clone the repository.
-
Clone the Repository
To download the repository and navigate to its directory:
git clone https://github.com/stefanocaramagno/Integrated_ML_Pipeline_for_Vehicle_Pricing.git cd Integrated_ML_Pipeline_for_Vehicle_Pricing
-
Install Dependencies
To install all required dependencies:
pip install numpy pandas matplotlib seaborn scikit-learn
-
Open the Notebook
To open the notebook, launch Visual Studio Code, and open:
integrated_ML_pipeline_for_vehicle_pricing.ipynb
-
Run the Script
To execute the entire workflow, click on "Run All" in the Jupyter notebook interface.
Feel free to explore my professional journey, check out my projects, or get in touch through the following platforms:
© Stefano Caramagno
Personal and Educational Use Only
All content in this repository is provided for personal and educational purposes only.
Unauthorized actions without explicit permission from the author are prohibited, including but not limited to:
- Commercial Use: Using any part of the content for commercial purposes.
- Distribution: Sharing or distributing the content to third parties.
- Modification: Altering, transforming, or building upon the content.
- Resale: Selling or licensing the content or any derivatives.
For permissions beyond the scope of this license, please contact the author.
Disclaimer
The content is provided "as is" without warranty of any kind, express or implied.
The author shall not be liable for any claims, damages, or other liabilities arising from its use.