Using Apache Spark to perform classification task on a marketing campaign dataset
Big Data was conceptualised out of the need to efficiently handle and utilize the massively increasing amount of data being generated globally by different technologies and organizations in various fields including the financial sector. These data could then be analysed to obtain insights that will be beneficial towards achieving the objectives of the organization. A popular method of obtaining useful data is through marketing campaign, which could include raising awareness of a new product to both new and existing customer bases and acquiring direct feedback from them. We analyse the data obtained from 45211 new or potential customers for a direct bank marketing campaign by a Portuguese financial institution on Apache Spark with the goal of gaining useful insights. The marketing campaign data is first explored and visualized to observe overall trends and pattern in customer behaviour and background. Afterwards, the dataset will be analysed to predict if a client will subscribe to a term deposit. This binary classification task is performed using 3 machine learning techniques; Support Vector Machine, Random Forest Decision Trees and Logistic Regression, and the results are investigated and further discussed.