-
-
Notifications
You must be signed in to change notification settings - Fork 66
Description
Describe the issue:
I am able to create clusters, connect using dask clients and perform Dask operations without issues using KubeCluster Operator on a Notebook. I am also able to connect to the status dashboard using port-forwarding to the scheduler.
However, I am not able to connect to these clusters when using the lab extensions. When I try to move to an active notebook and click search on the Dask Lab-extension, it does picks up a remote cluster address. The Dashboard URLs that are picked up the extension code look like,
http://internal-scheduler.namespace:8787/
But, I think the extension is not able to connect to it. I do not see any logs pertaining to this action.
Do these dashboards need to be external (meaning are these connections made from browser or backend service)?
Since I was not sure about this, I tried setting up AWS NLB. I tried connecting to the NLB address using the Client as seen in the second snippet below.
Minimal Complete Verifiable Example:
All of the following code snippets work fine from the notebook.
# Create a cluster
from dask_kubernetes.operator import make_cluster_spec, make_worker_spec
from dask_kubernetes.operator import KubeCluster
from dask.distributed import Client
import dask.dataframe as dd
import os
profile_name = namespace_name
custom_spec = make_cluster_spec(name=profile_name, image='ghcr.io/dask/dask:latest', resources={"requests": {"memory": "512Mi"}, "limits": {"cpu": "4","memory": "8Gi"}})
custom_spec['spec']['scheduler']['spec']['serviceAccount'] = 'default-editor'
custom_spec['spec']['worker']['spec']['serviceAccount'] = 'default-editor'
custom_worker_spec = make_worker_spec(image='ghcr.io/dask/dask:latest', n_workers=6, resources={"requests": {"memory": "512Mi"}, "limits": {"memory": "12Gi"}})
custom_worker_spec['spec']['serviceAccount'] = 'default-editor'
custom_worker_spec
cluster = KubeCluster(custom_cluster_spec=custom_spec, n_workers=0)
cluster.add_worker_group(name='highmem', custom_spec=custom_worker_spec)
As mentioned, let's assume that I have AWS NLB type LoadBalancer/Ingress Service. Then the Dask Client is able to successfully interact against 8787 and 8786 ports on the scheduler in order to manage the workers and jobs, externally.
# Connect to external endpoint works fine
import dask; from dask.distributed import Client
dask.config.set({'scheduler-address': 'tcp://nlb-address.region.elb.amazonaws.com:8786'})
client = Client()
Anything else we need to know?:
Another thing noticed was that the dask-extension relies on testDaskDashboard
function to pick up the URL info (defined in https://github.com/dask/dask-labextension/blob/main/src/dashboard.tsx#L588),
In the console, I can see,
Found the dashboard link at 'http://internal-scheduler.namespace:8787/status'
However, the consequent dashboard-check request to the backend is replacing an extra /
from protocol.
See the GET request below,
To be a bit more verbose,
http%3A%2Finternal-scheduler.namespace%3A8787%2F?1673363416491
translates to http:/internal-scheduler.namespace:8787/?1673363416491
I am not sure if this is expected or a bug.
Environment:
- Dask version: 2022.12.1
- Dask Kubernetes: 2022.12.0
- @dask-labextension: v6.0.0
- @jupyterlab/server-proxy: v3.2.2
- Python version: 3.8.10
- Platform: Kubeflow
- Install method (conda, pip, source): pip