Skip to content

Conversation

Copy link
Contributor

Copilot AI commented May 30, 2025

Problem

The build/k8s/setup.sh script creates Azure Managed Grafana instances and immediately attempts to create dashboards. However, the Grafana instance may not be fully operational immediately after the az grafana create command completes, causing dashboard creation to fail with errors.

Solution

Added two complementary mechanisms to ensure reliable dashboard creation:

1. Wait Mechanism for Grafana Readiness

After creating a new Grafana instance, the script now waits for it to be fully provisioned:

  • Polls the provisioning state using az grafana show every 30 seconds
  • Waits until provisioningState is "Succeeded" and endpoint is available
  • Maximum wait time of 15 minutes with clear progress feedback
  • Gracefully handles timeouts with warning messages

2. Retry Logic for Dashboard Creation

Dashboard creation now includes robust retry logic:

  • Up to 3 attempts per dashboard with 10-second delays between retries
  • Individual dashboard failures don't block others from being imported
  • Clear status messages for each attempt and final outcomes
  • Fails gracefully if all retries are exhausted

Example Output

The test-grafana instance does not exist. Creating it in test-rg resource group.
Waiting for Grafana instance to be ready...
Grafana instance not ready yet (provisioning state: InProgress). Waiting 30 seconds...
Grafana instance is ready (provisioning state: Succeeded)
Importing dashboards to Grafana instance...
Importing dashboard: api-server
Successfully imported dashboard: api-server

Testing

  • Verified bash syntax correctness
  • Tested wait logic with simulated slow provisioning scenarios
  • Tested retry logic with simulated dashboard creation failures
  • All tests pass successfully

Files Changed

  • build/k8s/setup.sh - Added wait and retry mechanisms
  • build/k8s/bundle.sh - Regenerated to include updated setup.sh

Fixes #749.

Warning

Firewall rules blocked me from connecting to one or more addresses

I tried to connect to the following addresses, but was blocked by firewall rules:

  • aka.ms
    • Triggering command: /usr/bin/../../opt/az/bin/python3 -Im azure.cli grafana --help (dns block)
    • Triggering command: /usr/bin/../../opt/az/bin/python3 -Im azure.cli extension list-available (dns block)
  • app.aladdin.microsoft.com
    • Triggering command: /usr/bin/../../opt/az/bin/python3 -Im azure.cli grafana --help (dns block)

If you need me to access, download, or install something from one of these locations, you can either:


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: matucker-msft <77026928+matucker-msft@users.noreply.github.com>
Copilot AI changed the title [WIP] Grafana dashboards are attempted to be created before managed grafana instance is available Fix Grafana dashboard creation timing issue by adding wait and retry mechanisms May 30, 2025
Copilot AI requested a review from matucker-msft May 30, 2025 21:27
Copilot finished work on behalf of matucker-msft May 30, 2025 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Grafana dashboards are attempted to be created before managed grafana instance is available

2 participants