Skip to content

linear regression using Apache Spark's MLLib library. This involves Creating a Spark session, loading the data and pre-processing it (such as feature engineering and transformation), Creating a feature vector, splitting it into training and testing sets, training a linear regression model

Notifications You must be signed in to change notification settings

jotstolu/Linear-Regression-Analysis-Using-SparkML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Linear-Regression-Analysis-Using-SparkMLLIB

This task aims to create a workflow or a series of steps to perform linear regression using Apache Spark's MLlib library. This involves Creating a Spark session, loading the data and pre-processing it (such as feature engineering and transformation), Creating a feature vector, splitting it into training and testing sets, training a linear regression model, evaluating the model's performance, and potentially deploying it for making predictions. To perform this operation using Apache Spark, a non- trivial research dataset - Restaurants Revenue Prediction dataset was obtained from Kaggle website which proves valuable for running this operation due to its size, complexity, variety, validity and real-world relevance. The dataset includes features crucial for understanding restaurant performance and revenue generation. These features encompass a range of factors such as menu pricing, marketing expenditure, customer engagement, and operational dynamics. The dataset consists of 1000 instances and 8 features, and there were no missing values present in the dataset. All features are numerical, except for cuisine type, which is nominal The Cuisine Type feature, being categorical, was converted into a numerical vector using Spark MLlib's String Indexer. This transformation is essential for regression analysis because most machine learning algorithms require numerical inputs.

Features:

  • Number of Customers: The total count of customers served within a month.
  • Menu Price: The pricing structure of the items offered on the restaurant's menu.
  • Marketing Spend: The amount allocated towards marketing and promotional activities.
  • Cuisine Type: The style or category of cuisine offered by the restaurant.
  • Average Customer Spending: The mean expenditure of each customer during their visit.
  • Promotions: Any promotional activities or discounts offered by the restaurant.
  • Reviews: Feedback or reviews provided by customers.
  • Monthly Revenue: The target variable representing the total revenue generated by the restaurant in a month.

About

linear regression using Apache Spark's MLLib library. This involves Creating a Spark session, loading the data and pre-processing it (such as feature engineering and transformation), Creating a feature vector, splitting it into training and testing sets, training a linear regression model

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published