Skip to content

aber0016/Real_Time_Big_Data_Streaming_Spark_Kafka

Repository files navigation

Real_Time_Big_Data_Streaming_Spark_Kafka

In this project, we process flight data in real-time using Apache Spark and Kafka to perform streaming classification. For that purpose, we have a total of three scripts. A Kafka Producer, a Kafka Consumer, and a Spark Structured Streaming Classification. The flight delays and cancellation data was collected and published by the U.S. Department of Transportation’s (DOT) Bureau of Transportation Statistics. This data records the flights operated by large air carriers and tracks the on-time performance of domestic flights. This data summarises various flight information such as the number of on-time, delayed, cancelled, and diverted flights published in DOT's monthly in 2015.

alt text

About

Processing flight data in real-time using Apache Spark and Kafka to perform streaming classification.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published