Skip to content

This project was completed as part of a Machine Learning course during my Master's Degree in Computer Science and Engineering at the University of Catania.

Notifications You must be signed in to change notification settings

stefanocaramagno/Integrated_ML_Pipeline_for_Vehicle_Pricing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Integrated ML Pipeline for Vehicle Pricing

📚 Introduction

My name is Stefano Caramagno, and I'm pleased to present this repository containing a project on creation of a integrated ML pipeline for vehicle pricing.
This project was completed as part of the Machine Learning course during my Master's Degree in Computer Science and Engineering at the University of Catania.

✨ Features

  • Data Loading: Performs structured loading of the dataset, inspects variable types, and analyzes initial distributions to understand data composition and structure.
  • Data Preprocessing: Cleans and prepares the dataset by handling missing values, encoding categorical variables, and normalizing features to ensure model compatibility.
  • Supervised Learning Models: Implements a broad range of regression algorithms to predict car prices, covering both linear and non-linear supervised techniques.
  • Hyperparameter Optimization: Applies hyperparameter tuning techniques to identify optimal configurations and improve the generalization of supervised models.
  • Supervised Model Evaluation: Evaluates and compares regression models using performance metrics and validation techniques to measure prediction effectiveness.
  • Unsupervised Learning Techniques: Applies clustering algorithms to group data points based on hidden patterns, aiming to discover natural structures in the dataset.
  • Unsupervised Model Evaluation: Evaluates and compares clustering models results using general-purpose techniques and visualizations to assess quality of groupings.
  • Semi-Supervised Learning Models: Implements models that leverage both labeled and unlabeled data to replicate real-world scenarios with incomplete supervision.
  • Semi-Supervised Model Evaluation: Evaluates and compares semi-supervised models by analyzing learning effectiveness and performance across different label distributions.
  • End-to-End ML Pipeline: Organizes the entire machine learning process from data loading to prediction and evaluation, ensuring workflow reproducibility and clarity.
  • Result Documentation: Summarizes findings through structured markdown cells, explaining key insights and guiding interpretation of each analytical stage.

🛠️ Tech Stack

  • Programming Language: Python for implementing data preprocessing and machine learning models.
  • Relevant Libraries:
    • NumPy for efficient numerical operations and data manipulation.
    • Pandas for data loading, preprocessing, and tabular data handling.
    • Matplotlib for static data visualizations and model output representation.
    • Seaborn for enhanced statistical plots and correlation analysis.
    • Scikit-learn for machine learning algorithms and evaluation techniques.
  • Dependency Management: Pip for installing and managing project dependencies.
  • IDE: Visual Studio Code for development and debugging.
  • Relevant Extensions:
    • Jupyter for notebook-based development and interactive execution within VS Code.
  • Version Control: Git for tracking changes and managing project versions.
  • Repository Hosting: GitHub for storing, sharing, and maintaining the project repository.

🚀 Getting Started

Prerequisites

Ensure you have the following tools installed on your system before proceeding:

  • Python: Version 3.9 or later, required to run the script.
  • Required Libraries: Install the following libraries using pip from the terminal:
    • NumPy: Required for efficient numerical operations and data manipulation.
    • Pandas : Required for data loading, preprocessing, and tabular data handling.
    • Matplotlib : Required for static data visualizations and model output representation.
    • Seaborn : Required for enhanced statistical plots and correlation analysis.
    • Scikit-learn : Required for machine learning algorithms and evaluation techniques.
  • Pip: Used to install required dependencies.
  • IDE: Required to read and understand code efficiently.
  • Relevant Extensions:
    • Jupyter : Required for notebook-based interactive execution within VS Code.
  • Git: Used to clone the repository.

Installation Steps

  1. Clone the Repository

    To download the repository and navigate to its directory:

    git clone https://github.com/stefanocaramagno/Integrated_ML_Pipeline_for_Vehicle_Pricing.git
    cd Integrated_ML_Pipeline_for_Vehicle_Pricing
  2. Install Dependencies

    To install all required dependencies:

    pip install numpy pandas matplotlib seaborn scikit-learn
  3. Open the Notebook

    To open the notebook, launch Visual Studio Code, and open:

    integrated_ML_pipeline_for_vehicle_pricing.ipynb

Running the Application

  1. Run the Script

    To execute the entire workflow, click on "Run All" in the Jupyter notebook interface.

🌐 Connect with Me

Feel free to explore my professional journey, check out my projects, or get in touch through the following platforms:

Email Portfolio LinkedIn Indeed GitHub YouTube

⚖️ License

© Stefano Caramagno

Personal and Educational Use Only
All content in this repository is provided for personal and educational purposes only.
Unauthorized actions without explicit permission from the author are prohibited, including but not limited to:

  • Commercial Use: Using any part of the content for commercial purposes.
  • Distribution: Sharing or distributing the content to third parties.
  • Modification: Altering, transforming, or building upon the content.
  • Resale: Selling or licensing the content or any derivatives.

For permissions beyond the scope of this license, please contact the author.

Disclaimer
The content is provided "as is" without warranty of any kind, express or implied.
The author shall not be liable for any claims, damages, or other liabilities arising from its use.

Releases

No releases published

Packages

No packages published