Add "User Group Diagnostics" Grafana dashboard #6065

jnywong · 2025-05-16T10:50:44Z

This PR adds a new "User Group Diagnostics" Grafana dashboard that complements the "User Diagnostics Dashboard" to show resource usage aggregated by user group.

Requires jupyterhub-groups-exporter to be set up on the hub for the dashboard to work, but if not, then see below. I have created a fork of the upstream dashboards while we validate implementation across 2i2c-hosted hubs.

For this to universally work across all our infrastructure, I decided to separate the user group dashboard entirely from user name based aggregation. This is because if I combined both user name and user group into one dashboard/PromQL query, then it would break the dashboard for hubs that do not have jupyterhub-groups-exporter set up because the jupyterhub_user_group_info metric is unavailable to the PromQL. Therefore, if a hub does not have jupyterhub-groups-exporter set up, then the "User Diagnostics" dashboard will work as normal but the "User Group Diagnostics" dashboard will show no data.

The "User Diagnostics" dashboard included in this PR differs from the upstream version of user.jsonnet, because the upstream version is technically a "Pod Diagnostics" dashboard. This PR aggregates pod-level data on a per user basis and uses unescaped usernames from metric kube_pod_annotations, rather than limited charactersets from kube_pod_labels.

Note

Metrics are available as a time series from the date of initially manually deploying the jupyterhub-groups-exporter service (therefore some PromQL in this PR is invalid prior to deployment, say, since before 18 May 2025). If you see an execution error in the dashboard, try selecting a more recent time window.

Ref: #5983

jnywong · 2025-05-16T11:09:04Z

Hey @GeorgianaElena ! Just requested a review from you to just check if you think this is okay to implement across all of our clusters.

I think it should be fine, but I would value your infra eng experience here to give a quick green/red light :)

jnywong · 2025-05-16T11:27:49Z

Actually, not ready for review yet, just found a bug!

GeorgianaElena

Great work @jnywong <3 ! I didn't pay too much attention to the queries, so this review is more from an infra perspective.

If you look at

infrastructure/deployer/commands/grafana/deploy_dashboards.py

Lines 58 to 72 in a526d62

    
           if cluster_provider == "aws": 
        
               print_colour("Deploying cloud cost dashboards to an AWS cluster...") 
        
               subprocess.check_call( 
        
                   [ 
        
                       "./deploy.py", 
        
                       grafana_url, 
        
                       "--dashboards-dir=../grafana-dashboards", 
        
                       "--folder-name=Cloud cost dashboards", 
        
                       "--folder-uid=cloud-cost", 
        
                   ], 
        
                   env=deploy_script_env, 
        
                   cwd="jupyterhub-grafana-dashboards", 
        
               ) 
        
           print_colour(f"Done! Dashboards deployed to {grafana_url}.")

The dashboards here are deployed on aws only and are categorized as cost-related dashboards.

In my opinion, ideally this work should be instead upstream, in jupyterhub/grafana-dashboards. I believe 2i2c uses that repo most intensively so we shouldn't be impacting lots of people if we were to merge this ourselves.

If you want to be on the safe side, we could also host them under 2i2c for a month lets say, validate that they are correct, then upstream them. In this time, we would also open an upstream issue about this intention.

If we are to go with option two, then we need to update the deployer command and the general structure of this directory because it looks like it's growing to be more than cost-related dashboards.

What do you think?

jnywong · 2025-05-19T08:27:14Z

Your infra perspective is exactly what I needed :)

Yeah, eventually I would like to upstream this, but like you said, to be on the safe side we can validate and host this on the 2i2c side while we wait.

I'll make changes to this PR to update the deployer command then. Thanks!

github-actions · 2025-05-19T10:48:26Z

Merging this PR will trigger the following deployment actions.

Support deployments

No support upgrades will be triggered

Staging deployments

Cloud Provider	Cluster Name	Hub Name	Reason for Redeploy
aws	nmfs-openscapes	staging	Core infrastructure has been modified
aws	openscapes	staging	Core infrastructure has been modified
gcp	cloudbank	staging	Core infrastructure has been modified
gcp	2i2c	staging	Core infrastructure has been modified
gcp	2i2c	dask-staging	Core infrastructure has been modified
gcp	2i2c	ucmerced-staging	Core infrastructure has been modified
aws	nasa-ghg	staging	Core infrastructure has been modified
aws	maap	staging	Core infrastructure has been modified
gcp	hhmi	staging	Core infrastructure has been modified
aws	reflective	staging	Core infrastructure has been modified
aws	disasters	staging	Core infrastructure has been modified
aws	smithsonian	staging	Core infrastructure has been modified
gcp	awi-ciroh	staging	Core infrastructure has been modified
kubeconfig	utoronto	staging	Core infrastructure has been modified
kubeconfig	utoronto	r-staging	Core infrastructure has been modified
aws	2i2c-aws-us	staging	Core infrastructure has been modified
aws	2i2c-aws-us	dask-staging	Core infrastructure has been modified
aws	jupyter-health	staging	Core infrastructure has been modified
aws	nasa-veda	staging	Core infrastructure has been modified
aws	nasa-cryo	staging	Core infrastructure has been modified
gcp	leap	staging	Core infrastructure has been modified
gcp	2i2c-uk	staging	Core infrastructure has been modified
aws	projectpythia	staging	Core infrastructure has been modified
aws	strudel	staging	Core infrastructure has been modified
gcp	catalystproject-latam	staging	Core infrastructure has been modified
aws	catalystproject-africa	staging	Core infrastructure has been modified
kubeconfig	2i2c-jetstream2	staging	Core infrastructure has been modified
aws	opensci	staging	Core infrastructure has been modified
aws	victor	staging	Core infrastructure has been modified
aws	earthscope	staging	Core infrastructure has been modified
aws	ubc-eoas	staging	Core infrastructure has been modified
gcp	climatematch	staging	Core infrastructure has been modified

Production deployments

Cloud Provider	Cluster Name	Hub Name	Reason for Redeploy
aws	nmfs-openscapes	prod	Core infrastructure has been modified
aws	nmfs-openscapes	workshop	Core infrastructure has been modified
aws	nmfs-openscapes	noaa-only	Core infrastructure has been modified
aws	openscapes	prod	Core infrastructure has been modified
aws	openscapes	workshop	Core infrastructure has been modified
gcp	cloudbank	authoring	Core infrastructure has been modified
gcp	cloudbank	bcc	Core infrastructure has been modified
gcp	cloudbank	ccc	Core infrastructure has been modified
gcp	cloudbank	ccsf	Core infrastructure has been modified
gcp	cloudbank	chabot	Core infrastructure has been modified
gcp	cloudbank	csm	Core infrastructure has been modified
gcp	cloudbank	csueb	Core infrastructure has been modified
gcp	cloudbank	csuf	Core infrastructure has been modified
gcp	cloudbank	csula	Core infrastructure has been modified
gcp	cloudbank	csulb	Core infrastructure has been modified
gcp	cloudbank	csun	Core infrastructure has been modified
gcp	cloudbank	csum	Core infrastructure has been modified
gcp	cloudbank	csumb	Core infrastructure has been modified
gcp	cloudbank	csus	Core infrastructure has been modified
gcp	cloudbank	demo	Core infrastructure has been modified
gcp	cloudbank	dvc	Core infrastructure has been modified
gcp	cloudbank	elac	Core infrastructure has been modified
gcp	cloudbank	elcamino	Core infrastructure has been modified
gcp	cloudbank	evc	Core infrastructure has been modified
gcp	cloudbank	fresno	Core infrastructure has been modified
gcp	cloudbank	foothill	Core infrastructure has been modified
gcp	cloudbank	glendale	Core infrastructure has been modified
gcp	cloudbank	high	Core infrastructure has been modified
gcp	cloudbank	howard	Core infrastructure has been modified
gcp	cloudbank	humboldt	Core infrastructure has been modified
gcp	cloudbank	lacc	Core infrastructure has been modified
gcp	cloudbank	lamission	Core infrastructure has been modified
gcp	cloudbank	laney	Core infrastructure has been modified
gcp	cloudbank	lavc	Core infrastructure has been modified
gcp	cloudbank	lbcc	Core infrastructure has been modified
gcp	cloudbank	mendocino	Core infrastructure has been modified
gcp	cloudbank	merced	Core infrastructure has been modified
gcp	cloudbank	mills	Core infrastructure has been modified
gcp	cloudbank	miracosta	Core infrastructure has been modified
gcp	cloudbank	mission	Core infrastructure has been modified
gcp	cloudbank	moreno	Core infrastructure has been modified
gcp	cloudbank	norco	Core infrastructure has been modified
gcp	cloudbank	palomar	Core infrastructure has been modified
gcp	cloudbank	pasadena	Core infrastructure has been modified
gcp	cloudbank	reedley	Core infrastructure has been modified
gcp	cloudbank	riohondo	Core infrastructure has been modified
gcp	cloudbank	sacramento	Core infrastructure has been modified
gcp	cloudbank	saddleback	Core infrastructure has been modified
gcp	cloudbank	santiago	Core infrastructure has been modified
gcp	cloudbank	sbcc	Core infrastructure has been modified
gcp	cloudbank	sbcc-dev	Core infrastructure has been modified
gcp	cloudbank	sierra	Core infrastructure has been modified
gcp	cloudbank	sjcc	Core infrastructure has been modified
gcp	cloudbank	sjsu	Core infrastructure has been modified
gcp	cloudbank	skyline	Core infrastructure has been modified
gcp	cloudbank	srjc	Core infrastructure has been modified
gcp	cloudbank	tuskegee	Core infrastructure has been modified
gcp	cloudbank	ucsc	Core infrastructure has been modified
gcp	cloudbank	wlac	Core infrastructure has been modified
gcp	dubois	ephemeral	Core infrastructure has been modified
gcp	2i2c	imagebuilding-demo	Core infrastructure has been modified
gcp	2i2c	binderhub-ui-demo	Core infrastructure has been modified
gcp	2i2c	demo	Core infrastructure has been modified
gcp	2i2c	temple	Core infrastructure has been modified
gcp	2i2c	ucmerced	Core infrastructure has been modified
gcp	2i2c	mtu	Core infrastructure has been modified
aws	nasa-ghg	prod	Core infrastructure has been modified
aws	nasa-ghg	binder	Core infrastructure has been modified
aws	maap	prod	Core infrastructure has been modified
gcp	hhmi	prod	Core infrastructure has been modified
gcp	hhmi	spyglass	Core infrastructure has been modified
gcp	hhmi	binder	Core infrastructure has been modified
aws	reflective	prod	Core infrastructure has been modified
aws	reflective	workshop	Core infrastructure has been modified
aws	disasters	prod	Core infrastructure has been modified
aws	smithsonian	prod	Core infrastructure has been modified
gcp	awi-ciroh	prod	Core infrastructure has been modified
gcp	awi-ciroh	workshop	Core infrastructure has been modified
kubeconfig	utoronto	prod	Core infrastructure has been modified
kubeconfig	utoronto	r-prod	Core infrastructure has been modified
kubeconfig	utoronto	highmem	Core infrastructure has been modified
aws	2i2c-aws-us	showcase	Core infrastructure has been modified
aws	jupyter-health	prod	Core infrastructure has been modified
aws	nasa-veda	prod	Core infrastructure has been modified
aws	nasa-veda	binder	Core infrastructure has been modified
kubeconfig	projectpythia-binder	binderhub	Core infrastructure has been modified
aws	nasa-cryo	prod	Core infrastructure has been modified
gcp	leap	prod	Core infrastructure has been modified
gcp	leap	public	Core infrastructure has been modified
gcp	2i2c-uk	lis	Core infrastructure has been modified
aws	projectpythia	prod	Core infrastructure has been modified
aws	projectpythia	pythia-binder	Core infrastructure has been modified
aws	strudel	prod	Core infrastructure has been modified
gcp	catalystproject-latam	unitefa-conicet	Core infrastructure has been modified
gcp	catalystproject-latam	cicada	Core infrastructure has been modified
gcp	catalystproject-latam	gita	Core infrastructure has been modified
gcp	catalystproject-latam	iner	Core infrastructure has been modified
gcp	catalystproject-latam	plnc	Core infrastructure has been modified
gcp	catalystproject-latam	unam	Core infrastructure has been modified
gcp	catalystproject-latam	cabana	Core infrastructure has been modified
gcp	catalystproject-latam	nnb-ccg	Core infrastructure has been modified
gcp	catalystproject-latam	labi	Core infrastructure has been modified
gcp	catalystproject-latam	areciboc3	Core infrastructure has been modified
gcp	catalystproject-latam	valledellili	Core infrastructure has been modified
aws	catalystproject-africa	nm-aist	Core infrastructure has been modified
aws	catalystproject-africa	must	Core infrastructure has been modified
aws	catalystproject-africa	uvri	Core infrastructure has been modified
aws	catalystproject-africa	wits	Core infrastructure has been modified
aws	catalystproject-africa	kush	Core infrastructure has been modified
aws	catalystproject-africa	molerhealth	Core infrastructure has been modified
aws	catalystproject-africa	aibst	Core infrastructure has been modified
aws	catalystproject-africa	bhki	Core infrastructure has been modified
aws	catalystproject-africa	bon	Core infrastructure has been modified
aws	opensci	sciencecore	Core infrastructure has been modified
aws	opensci	climaterisk	Core infrastructure has been modified
aws	opensci	small-binder	Core infrastructure has been modified
aws	opensci	big-binder	Core infrastructure has been modified
aws	victor	prod	Core infrastructure has been modified
aws	earthscope	prod	Core infrastructure has been modified
aws	earthscope	binder	Core infrastructure has been modified
aws	ubc-eoas	prod	Core infrastructure has been modified
gcp	climatematch	prod	Core infrastructure has been modified

jnywong · 2025-05-19T10:59:25Z

Deployer updated, dashboards moved to 2i2c hosted repo and upstream PR waiting in draft 👍

GeorgianaElena · 2025-05-19T11:35:53Z

this is perfect @jnywong ❤️ I feel comfortable merging your PR upstream as well (I have the rights) if you feel confident too. Just let me know, otherwise feel free to merge this one and maybe also open an internal 2i2c tracking issue to review+merge upstream PR after some testing on our infra to make sure we don't loose track of it.

I believe the biggest challenge is having the 2i2c repo diverge and having to maintain and contribute in two places.

jnywong · 2025-05-19T12:50:50Z

Great, thank you @GeorgianaElena!

I agree, we do not want these two repos to diverge and I will let you know when #5315 is ready to be merged and upstreamed once I am happy.

To not lose track, I have added this task to the DoD in the parent initiative: #5315

* Add new user group diagnostics dashboard and update user diagnostics * Use unescaped $user_name from pod annotation instead of $user_pod from pod label * Add home dir usage panel for groups * Use unescaped username for home dir usage panel * Move dashboards to https://github.com/2i2c-org/grafana-dashboards * Revert variables for cloud cost dashboard * Point deployer to 2i2c hosted dashboards

jnywong added 3 commits May 16, 2025 11:39

Add new user group diagnostics dashboard and update user diagnostics

30bcf5a

update title

60529ba

Reinstate old variable and define new variable

1d9ca84

jnywong self-assigned this May 16, 2025

jnywong requested a review from GeorgianaElena May 16, 2025 11:06

jnywong removed the request for review from GeorgianaElena May 16, 2025 11:27

jnywong marked this pull request as draft May 16, 2025 11:27

Remove $user_pod and fix $user_name

cb588ea

jnywong marked this pull request as ready for review May 16, 2025 12:55

jnywong requested a review from GeorgianaElena May 16, 2025 12:55

jnywong added 2 commits May 18, 2025 15:28

Add home dir usage dashboard back in for groups

4b6975f

Use unescaped username for home directory usage panel

e449f4d

GeorgianaElena reviewed May 19, 2025

View reviewed changes

jnywong marked this pull request as draft May 19, 2025 08:27

This was referenced May 19, 2025

Add User Group Diagnostics dashboard and use unescaped usernames jupyterhub/grafana-dashboards#148

Closed

Add User Group Diagnostics dashboard and use unescaped usernames 2i2c-org/grafana-dashboards#1

Merged

jnywong added 3 commits May 19, 2025 11:42

Move dashboards to https://github.com/2i2c-org/grafana-dashboards

69d6f5c

Revert variables for cloud cost dashboard

00cf40b

Point deployer to 2i2c hosted dashboards

f52435a

jnywong mentioned this pull request May 19, 2025

Add "User Group Diagnostics" Grafana dashboard jupyterhub/grafana-dashboards#149

Merged

jnywong marked this pull request as ready for review May 19, 2025 10:59

jnywong requested a review from GeorgianaElena May 19, 2025 10:59

jnywong merged commit 6162e97 into 2i2c-org:main May 19, 2025
41 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add "User Group Diagnostics" Grafana dashboard #6065

Add "User Group Diagnostics" Grafana dashboard #6065

Uh oh!

jnywong commented May 16, 2025 •

edited

Loading

Uh oh!

jnywong commented May 16, 2025 •

edited

Loading

Uh oh!

jnywong commented May 16, 2025

Uh oh!

GeorgianaElena left a comment

Uh oh!

jnywong commented May 19, 2025

Uh oh!

github-actions bot commented May 19, 2025

Uh oh!

jnywong commented May 19, 2025

Uh oh!

GeorgianaElena commented May 19, 2025

Uh oh!

jnywong commented May 19, 2025

Uh oh!

Uh oh!

Uh oh!

	if cluster_provider == "aws":
	print_colour("Deploying cloud cost dashboards to an AWS cluster...")
	subprocess.check_call(
	[
	"./deploy.py",
	grafana_url,
	"--dashboards-dir=../grafana-dashboards",
	"--folder-name=Cloud cost dashboards",
	"--folder-uid=cloud-cost",
	],
	env=deploy_script_env,
	cwd="jupyterhub-grafana-dashboards",
	)

	print_colour(f"Done! Dashboards deployed to {grafana_url}.")

Add "User Group Diagnostics" Grafana dashboard #6065

Add "User Group Diagnostics" Grafana dashboard #6065

Uh oh!

Conversation

jnywong commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnywong commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnywong commented May 16, 2025

Uh oh!

GeorgianaElena left a comment

Choose a reason for hiding this comment

Uh oh!

jnywong commented May 19, 2025

Uh oh!

github-actions bot commented May 19, 2025

Support deployments

Staging deployments

Production deployments

Uh oh!

jnywong commented May 19, 2025

Uh oh!

GeorgianaElena commented May 19, 2025

Uh oh!

jnywong commented May 19, 2025

Uh oh!

Uh oh!

Uh oh!

jnywong commented May 16, 2025 •

edited

Loading

jnywong commented May 16, 2025 •

edited

Loading