Real-time Stream Processing with Kafka and Spark

This project demonstrates the implementation of a streaming data pipeline using Apache Kafka and Apache Spark for real-time analytics of credit card payment transactions.

Introduction

The blog post associated with this repository walks through the process of setting up a streaming data pipeline, aggregating payment transactions in real-time, and highlights the importance of stream processing for fraud detection.

Tech Stack

Python: Used for application development and scripting.
Apache Kafka: Used as the messaging backbone for real-time data streams.
Apache Spark: Utilized for stream processing and analytics.
Docker: Enables easy setup and management of Kafka and Zookeeper services.

Prerequisites

Ensure you have the following prerequisites installed:

Docker
- Mac users: brew install --cask docker
- Windows users: choco install docker-desktop

Getting Started

Setup Kafka Cluster:
- Use docker-compose.yml to set up a local Kafka cluster with Zookeeper.
Generating Payment Transactions:
- Run payments-generator.py to generate synthetic transaction data using Faker library.
Stream Processing:
- Execute process_transaction.py to process payment transactions in real-time using Spark.

Generating Payment Transactions

The payments-generator.py script generates synthetic payment transactions, simulating real-world data for demonstration purposes. It utilizes the Faker library to create transactions with various fields.

Stream Processing

The process_transaction.py script consumes payment transactions from Kafka, performs windowed aggregations, and calculates transaction counts and average amounts per window for fraud detection.

For detailed information and a step-by-step guide, refer to the associated blog post.

Feel free to reach out for any queries or suggestions!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
payments-generator.py		payments-generator.py
payments-processor-spark.py		payments-processor-spark.py
payments-processor.py		payments-processor.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Real-time Stream Processing with Kafka and Spark

Table of Contents

Introduction

Tech Stack

Prerequisites

Getting Started

Generating Payment Transactions

Stream Processing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

kishorechk/streaming-data-processing

Folders and files

Latest commit

History

Repository files navigation

Real-time Stream Processing with Kafka and Spark

Table of Contents

Introduction

Tech Stack

Prerequisites

Getting Started

Generating Payment Transactions

Stream Processing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages