IPL Analysis using Azure Databricks

Project Description

This project aims to analyze Indian Premier League (IPL) data using Azure Databricks. The project leverages the power of Apache Spark, a fast and general-purpose cluster computing system, to analyze large datasets.

Architecture Diagram

The architecture diagram above depicts the flow of data in the project. The IPL data is sourced from https://cricsheet.org/downloads/. The data is then ingested into Azure Blob Storage, which serves as a data lake. Azure Databricks is used to read and process the data, and the results of the analysis are stored in Azure SQL Database.

Getting Started

To get started with this project, follow the steps below:

Create an Azure Databricks workspace and start a new cluster. You can choose the lowest configuration cluster to start with, which is the Standard_DS3_v2 instance with 2 cores and 14 GB RAM.
Create an Azure Data Lake Storage Gen2 (ADLS Gen2) account in the Azure portal.
In the ADLS Gen2 account, create a container for storing the IPL data.
In the Azure portal, create an Azure Key Vault.
Add a secret to the Key Vault that contains the ADLS Gen2 storage account key. This allows the Databricks cluster to access the ADLS Gen2 container.

Usage

Azure Databricks is used to process the IPL data. The data is read from Azure Blob Storage, and processed using Spark streaming DataFrames and delta live tables. Various analyses are performed on the data, including but not limited to:

Top batsmen and bowlers of the tournament
Team-wise and player-wise statistics
Analysis of player performance in various scenarios

The results of the analysis are stored in delta tables, and are visualized using databricks visualization. Interactive dashboards are created to display the results of the analysis.

Teaser

Databricks_IPL_Final.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
images		images
00_Mount_Azure_Container.py		00_Mount_Azure_Container.py
00_Setup.py		00_Setup.py
00_Structured_Processing.py		00_Structured_Processing.py
01_Create_DLT.sql		01_Create_DLT.sql
02_Batting_Analysis.sql		02_Batting_Analysis.sql
03_Bowling_Analysis.sql		03_Bowling_Analysis.sql
10.SchemaGenerator.py		10.SchemaGenerator.py
FileGenerator.py		FileGenerator.py
README.md		README.md
Temp.sql		Temp.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IPL Analysis using Azure Databricks

Project Description

Architecture Diagram

Getting Started

Usage

Teaser

Sample Output

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

cloudry22/IPLAnalysisDatabricks

Folders and files

Latest commit

History

Repository files navigation

IPL Analysis using Azure Databricks

Project Description

Architecture Diagram

Getting Started

Usage

Teaser

Sample Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages