In this project, we process flight data in real-time using Apache Spark and Kafka to perform streaming classification. For that purpose, we have a total of three scripts. A Kafka Producer, a Kafka Consumer, and a Spark Structured Streaming Classification. The flight delays and cancellation data was collected and published by the U.S. Department of Transportation’s (DOT) Bureau of Transportation Statistics. This data records the flights operated by large air carriers and tracks the on-time performance of domestic flights. This data summarises various flight information such as the number of on-time, delayed, cancelled, and diverted flights published in DOT's monthly in 2015.
-
Notifications
You must be signed in to change notification settings - Fork 3
aber0016/Real_Time_Big_Data_Streaming_Spark_Kafka
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Processing flight data in real-time using Apache Spark and Kafka to perform streaming classification.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published