Skip to content

Commit 55efbdc

Browse files
authored
Update README.md
1 parent 1e6ae37 commit 55efbdc

File tree

1 file changed

+91
-123
lines changed

1 file changed

+91
-123
lines changed

README.md

Lines changed: 91 additions & 123 deletions
Original file line numberDiff line numberDiff line change
@@ -1,110 +1,120 @@
11
# Fine-Grained Demand Forecasting 📈
22

3-
[![Deploy](https://github.com/user/fine-grained-demand-forecasting/actions/workflows/deploy.yml/badge.svg)](https://github.com/user/fine-grained-demand-forecasting/actions/workflows/deploy.yml)
4-
[![Template](https://img.shields.io/badge/template-industry--solutions--blueprints-blue)](https://github.com/databricks-industry-solutions/industry-solutions-blueprints)
3+
[![Deploy](https://github.com/user/fine-grained-demand-forecasting/actions/workflows/databricks-ci.yml/badge.svg)](https://github.com/user/fine-grained-demand-forecasting/actions/workflows/databricks-ci.yml)
54

65
A scalable demand forecasting solution built on Databricks using Facebook Prophet, Unity Catalog, and serverless compute. This solution demonstrates modern MLOps practices for retail and supply chain forecasting at the store-item level.
76

8-
**✨ 2025 Modern Implementation** - Fully compliant with [Databricks Industry Solutions Blueprints](https://github.com/databricks-industry-solutions/industry-solutions-blueprints) template.
7+
## 🏪 Industry Use Case
98

10-
## 🚀 Quick Start
9+
**Fine-grained demand forecasting** represents a paradigm shift from traditional aggregate forecasting approaches. Instead of predicting demand at a high level (e.g., total company sales), fine-grained forecasting generates predictions for specific combinations of dimensions—in this case, **store-item level forecasting**.
1110

12-
1. **Prerequisites**
13-
```bash
14-
pip install databricks-cli
15-
```
11+
### Why Fine-Grained Forecasting Matters
1612

17-
2. **Configure Databricks**
18-
```bash
19-
# Option A: Interactive configuration
20-
databricks configure
21-
22-
# Option B: Environment file (recommended)
23-
cp env.example .env
24-
# Edit .env with your Databricks workspace URL, token, and warehouse ID
25-
```
13+
Traditional forecasting approaches often aggregate demand across locations, products, or time periods, losing critical nuances:
2614

27-
3. **Deploy Everything**
28-
```bash
29-
./scripts/deploy.sh
30-
```
15+
- **Aggregate Approach**: "We'll sell 10,000 units of Product A this month"
16+
- **Fine-Grained Approach**: "Store 1 will sell 45 units of Product A, Store 2 will sell 67 units, Store 3 will sell 23 units..."
17+
18+
This granular approach addresses real-world business challenges:
19+
20+
- **Inventory Optimization**: Precise allocation of inventory across locations based on local demand patterns
21+
- **Supply Chain Efficiency**: Targeted procurement and distribution strategies for each store-product combination
22+
- **Revenue Protection**: Early identification of demand shifts at specific locations before they impact overall performance
23+
- **Cost Reduction**: Elimination of safety stock inefficiencies caused by demand aggregation
24+
25+
### An Open-Source Approach to Complex Forecasting
3126

32-
4. **Clean Up When Done**
27+
This solution serves as **one inspirational approach** to tackle the technical challenges of fine-grained demand forecasting. The retail industry faces this problem universally, but solutions vary widely based on:
28+
29+
- **Scale Requirements**: From hundreds to millions of store-item combinations
30+
- **Data Architecture**: Different approaches to distributed processing and storage
31+
- **Algorithm Choice**: Prophet, ARIMA, neural networks, or hybrid approaches
32+
- **Infrastructure**: Cloud-native vs. on-premises, serverless vs. traditional compute
33+
34+
**This implementation demonstrates:**
35+
- How to structure a scalable forecasting pipeline using modern data platforms
36+
- Practical approaches to distributed time series modeling
37+
- Real-world considerations for data governance and MLOps
38+
39+
Whether you're a data scientist exploring forecasting techniques, a business leader understanding AI applications, or an engineer architecting similar solutions, this open-source example provides a foundation to build upon and adapt to your specific needs.
40+
41+
This solution scales from hundreds to thousands of store-item combinations, making it suitable for enterprise retail operations, e-commerce platforms, and multi-location businesses seeking to implement their own fine-grained forecasting capabilities.
42+
43+
## 🚀 Installation
44+
45+
### Recommended: Using Databricks Asset Bundle Editor
46+
47+
1. **Clone this repository** to your Databricks workspace:
3348
```bash
34-
./scripts/cleanup.sh
49+
git clone https://github.com/databricks-industry-solutions/fine-grained-demand-forecasting.git
3550
```
3651

37-
## 📊 What Gets Deployed
52+
2. **Open the DAB Editor UI** in your Databricks workspace:
53+
- Navigate to the cloned repository folder
54+
- Open the `databricks.yml` file
55+
- Click "Edit Bundle" to open the visual editor
56+
57+
3. **Configure and Run** the bundle:
58+
- Modify configuration variables as needed (catalog name, schema name, environment)
59+
- Click "Validate" to check your configuration
60+
- Click "Deploy" to deploy all resources
61+
- Click "Run" to execute the demand forecasting workflow
3862

39-
- **Workflow**: `Fine-Grained Demand Forecasting Pipeline`
40-
- **Notebooks**: `demand_forecasting_pipeline.ipynb` (Unity Catalog + Prophet forecasting)
41-
- **Dashboard**: `Fine-Grained Demand Forecasting Dashboard` (real-time insights)
42-
- **App**: `demand-forecasting-app` (Streamlit interactive explorer)
43-
- **Location**: `/Workspace/Users/your-email@company.com/fine-grained-demand-forecasting-dev/`
63+
### Alternative: Command Line
4464

45-
## 🔧 Manual Commands
65+
If you prefer using the command line:
4666

4767
```bash
48-
databricks bundle validate # Check configuration
49-
databricks bundle deploy # Deploy to workspace
50-
databricks bundle run demand_forecasting_workflow # Run forecasting
51-
databricks bundle summary # See what's deployed
52-
databricks bundle destroy # Remove everything
68+
# Prerequisites
69+
pip install databricks-cli
70+
71+
# Configure Databricks
72+
databricks configure
73+
74+
# Deploy and run
75+
databricks bundle validate
76+
databricks bundle deploy
77+
databricks bundle run demand_forecasting_workflow
5378
```
5479

5580
## 🏗️ Project Structure
5681

5782
```
5883
├── databricks.yml # Main DABs configuration
5984
├── notebooks/
60-
│ └── demand_forecasting_pipeline.ipynb # Main forecasting notebook
61-
├── dashboards/
62-
│ └── demand_forecasting_dashboard.lvdash.json # Real-time dashboard
63-
├── apps/
64-
│ └── demand_app/
65-
│ ├── app.py # Streamlit forecasting app
66-
│ └── app.yaml # App configuration
67-
├── src/
68-
│ └── demand_forecasting/ # Python package for forecasting logic
69-
│ ├── __init__.py
70-
│ ├── data_generation.py # Synthetic data generation
71-
│ └── forecasting.py # Prophet-based forecasting
72-
├── scripts/
73-
│ ├── deploy.sh # Automated deployment
74-
│ └── cleanup.sh # Automated cleanup
85+
│ ├── 01_data_generation_setup.py # Data foundation and Unity Catalog setup
86+
│ ├── 02_model_training_forecasting.py # Prophet model training and forecasting
87+
│ └── 03_results_analysis_visualization.py # Business insights and visualization
7588
├── .github/workflows/
76-
│ └── deploy.yml # CI/CD pipeline
77-
├── requirements.txt # Python dependencies
78-
└── env.example # Environment configuration template
89+
│ ├── databricks-ci.yml # CI/CD pipeline
90+
│ └── publish.yaml # Publishing workflow
91+
├── scripts/ # Deployment and utility scripts
92+
├── requirements.txt # Python dependencies
93+
├── env.example # Environment configuration template
94+
└── CONTRIBUTING.md # Contribution guidelines
7995
```
8096

81-
## ✨ Key Features
97+
## 📊 Forecasting Pipeline
8298

83-
### 🎯 Modern Databricks Architecture
84-
- **Asset Bundle (DAB) Deployment** - Infrastructure as code with multi-environment support
85-
- **Unity Catalog Integration** - Enterprise data governance and lineage
86-
- **Serverless Compute** - Cost-efficient auto-scaling with SQL warehouses and Photon engine
87-
- **100% Python Implementation** - Eliminated legacy R dependencies
99+
The solution implements a three-stage forecasting pipeline:
88100

89-
### 📈 Advanced Forecasting
90-
- **Facebook Prophet Models** - Robust time series forecasting with seasonality detection
91-
- **Distributed Processing** - Pandas UDFs for scalable store-item level forecasting
92-
- **Synthetic Data Generation** - No external data dependencies (replaces Kaggle)
93-
- **Confidence Intervals** - Prediction uncertainty quantification
101+
### 1. Data Generation & Setup (`01_data_generation_setup.py`)
102+
- Synthetic sales data generation with realistic seasonal patterns
103+
- Unity Catalog infrastructure setup (catalog, schema, tables)
104+
- Data quality validation and governance setup
94105

95-
### 🔄 MLOps Best Practices
96-
- **CI/CD Pipeline** - Automated testing and multi-stage deployment (dev → staging → prod)
97-
- **Data Quality Validation** - Automated checks for forecasting readiness
98-
- **Model Versioning** - Tracked model artifacts and performance metrics
99-
- **Real-time Dashboards** - Interactive Lakeview dashboards for business users
106+
### 2. Model Training & Forecasting (`02_model_training_forecasting.py`)
107+
- Facebook Prophet model training for each store-item combination
108+
- Distributed processing using Pandas UDFs for scalability
109+
- Confidence interval generation for uncertainty quantification
110+
- Forecast results storage in Delta tables
100111

101-
### 🛡️ Enterprise Ready
102-
- **Role-Based Access Control (RBAC)** - Unity Catalog security integration
103-
- **Multi-Environment Support** - Development, staging, and production configurations
104-
- **Audit Logging** - Complete data lineage and governance tracking
105-
- **Serverless Cost Optimization** - Pay-per-use compute with automatic scaling
112+
### 3. Results Analysis & Visualization (`03_results_analysis_visualization.py`)
113+
- Business insights and forecast accuracy metrics
114+
- Interactive visualizations and trend analysis
115+
- Executive dashboards and reporting
106116

107-
## 🎛️ Configuration
117+
## 🔧 Configuration
108118

109119
### Environment Variables (.env)
110120
```bash
@@ -115,53 +125,11 @@ CATALOG_NAME=dev_demand_forecasting
115125
SCHEMA_NAME=forecasting
116126
```
117127

118-
### Databricks Bundle Targets
119-
- **dev**: Development environment (single-user, personal workspace)
120-
- **staging**: Staging environment (shared workspace, validation)
121-
- **prod**: Production environment (service principal, enterprise governance)
122-
123-
## 📊 Forecasting Pipeline
124-
125-
1. **Data Generation**: Synthetic sales data with realistic seasonal patterns, trends, and noise
126-
2. **Unity Catalog Setup**: Automated catalog, schema, and table creation with optimizations
127-
3. **Quality Validation**: Data completeness and forecasting readiness checks
128-
4. **Distributed Forecasting**: Prophet models across store-item combinations using Pandas UDFs
129-
5. **Results Storage**: Forecast results with confidence intervals stored in Delta tables
130-
6. **Visualization**: Interactive dashboards and Streamlit apps for forecast exploration
131-
132-
## 🎨 Dashboard & Analytics
133-
134-
The solution includes comprehensive visualization components:
135-
136-
- **Lakeview Dashboard**: Real-time forecast summaries, trends, and accuracy metrics
137-
- **Streamlit App**: Interactive forecast explorer with filtering and drill-down capabilities
138-
- **Plotly Visualizations**: Time series plots with confidence intervals and seasonality decomposition
139-
140-
## 🔄 CI/CD Pipeline
141-
142-
Automated GitHub Actions workflow:
143-
- **Pull Requests**: Validation and testing with isolated workspace paths
144-
- **Main Branch**: Deployment to development environment
145-
- **Production**: Scheduled or manual deployment with approval gates
146-
- **Cleanup**: Automatic resource cleanup when PRs are closed
147-
148-
## 🏪 Use Cases
149-
150-
- **Retail Demand Planning**: Store-level inventory optimization
151-
- **Supply Chain Forecasting**: Multi-location demand coordination
152-
- **Revenue Forecasting**: Financial planning and budgeting
153-
- **Capacity Planning**: Resource allocation and workforce planning
154-
155-
## 🔗 Template Compliance
156-
157-
This solution is fully compliant with the [Databricks Industry Solutions Blueprints](https://github.com/databricks-industry-solutions/industry-solutions-blueprints) template, ensuring:
158-
159-
- ✅ Standard DAB structure and configuration
160-
- ✅ Jupyter notebook format (.ipynb)
161-
- ✅ Dashboard and app deployment
162-
- ✅ Automated deployment and cleanup scripts
163-
- ✅ Environment configuration templates
164-
- ✅ CI/CD pipeline integration
128+
### Key Configuration Options
129+
- **Catalog Name**: Unity Catalog name for data governance
130+
- **Schema Name**: Database schema for forecasting tables
131+
- **Environment**: Deployment environment (dev/staging/prod)
132+
- **Forecast Horizon**: Number of days to forecast ahead (configurable)
165133

166134
## 🤝 Contributing
167135

@@ -184,4 +152,4 @@ For issues and questions:
184152

185153
---
186154

187-
**Built with ❤️ using Databricks Asset Bundles, Unity Catalog, and Prophet** | *Modernized for 2025*
155+
**Built with ❤️ using Databricks Asset Bundles, Unity Catalog, and Prophet**

0 commit comments

Comments
 (0)