Skip to content

Support Databricks Workload Identity Federation for GitHub tokens #933

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 29, 2025

Conversation

hectorcast-db
Copy link
Contributor

@hectorcast-db hectorcast-db commented Mar 21, 2025

What changes are proposed in this pull request?

This PR adds support for Databricks Workload Identity Federation using GitHub tokens. This allows users to use WIF from their GitHub Workflows and authenticate their workloads without long lived secrets.

This new credentials strategy is added to the DefaultCredentialsStrategy after the other Databricks Credentials Strategy and before cloud specific authentication methods.
WIF credentials uses a subset of configuration values of other Databricks authentication methods. By being added after them it ensures that WIF is not used when other Databricks authentication methods are configured.
WIF uses the Databricks client id, which is not used by cloud specific authentication methods. Therefore, it will not be used when cloud specific authentication methods are configured.

How is this tested?

Added tests.

@hectorcast-db hectorcast-db force-pushed the hectorcast-db/databricks-wif branch from cef6e21 to 0bb1af4 Compare March 21, 2025 09:03
@hectorcast-db hectorcast-db force-pushed the hectorcast-db/databricks-wif branch from 0bb1af4 to 520d2d1 Compare March 21, 2025 09:24
@hectorcast-db hectorcast-db force-pushed the hectorcast-db/databricks-wif branch from 520d2d1 to 4c38da9 Compare March 21, 2025 09:34
@hectorcast-db hectorcast-db force-pushed the hectorcast-db/databricks-wif branch from 4c38da9 to 98a637e Compare March 21, 2025 09:41
@hectorcast-db hectorcast-db force-pushed the hectorcast-db/databricks-wif branch from 98a637e to 5685b9f Compare March 21, 2025 09:49
@hectorcast-db hectorcast-db force-pushed the hectorcast-db/databricks-wif branch from 5685b9f to f647c21 Compare March 21, 2025 09:54
@hectorcast-db hectorcast-db force-pushed the hectorcast-db/databricks-wif branch from f647c21 to 72c3549 Compare March 21, 2025 12:56
@hectorcast-db hectorcast-db force-pushed the hectorcast-db/databricks-wif branch from 72c3549 to b6f303c Compare March 21, 2025 13:00
@hectorcast-db hectorcast-db force-pushed the hectorcast-db/databricks-wif branch from b6f303c to 5063be3 Compare March 25, 2025 14:40
@hectorcast-db hectorcast-db changed the title [DRAFT] Support Databricks Workload Identity Federation for GitHub tokens Support Databricks Workload Identity Federation for GitHub tokens Mar 25, 2025
Copy link
Contributor

@renaudhartert-db renaudhartert-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as soon as we have agreement on the auth_type name.

README.md Outdated
| `token` | _(String)_ The Databricks personal access token (PAT) _(AWS, Azure, and GCP)_ or Azure Active Directory (Azure AD) token _(Azure)_. | `DATABRICKS_TOKEN` |
| `username` | _(String)_ The Databricks username part of basic authentication. Only possible when `Host` is `*.cloud.databricks.com` _(AWS)_. | `DATABRICKS_USERNAME` |
| `password` | _(String)_ The Databricks password part of basic authentication. Only possible when `Host` is `*.cloud.databricks.com` _(AWS)_. | `DATABRICKS_PASSWORD` |
- For Databricks wif authentication, you must provide `host`, `client_id` and `token_audience` _(optional)_; or their environment variable or `.databrickscfg` file field equivalents.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- For Databricks wif authentication, you must provide `host`, `client_id` and `token_audience` _(optional)_; or their environment variable or `.databrickscfg` file field equivalents.
- For Databricks WIF authentication, you must provide the `host`, `client_id` and `token_audience` _(optional)_ either directly, through the corresponding environment variables, or in your `.databrickscfg` configuration file.

@hectorcast-db hectorcast-db force-pushed the hectorcast-db/databricks-wif branch from 4cc6705 to 00ba2d7 Compare April 7, 2025 07:50
@hectorcast-db hectorcast-db force-pushed the hectorcast-db/databricks-wif branch from 4b0bed3 to a013ff3 Compare April 23, 2025 07:43
Copy link

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-py

Inputs:

  • PR number: 933
  • Commit SHA: a013ff39353943295df3ceff36f4df9269e8802b

Checks will be approved automatically on success.

@hectorcast-db hectorcast-db added this pull request to the merge queue Apr 29, 2025
Merged via the queue into main with commit 6d8d906 Apr 29, 2025
17 checks passed
@hectorcast-db hectorcast-db deleted the hectorcast-db/databricks-wif branch April 29, 2025 10:07
deco-sdk-tagging bot added a commit that referenced this pull request Apr 30, 2025
## Release v0.51.0

### New Features and Improvements
* Enabled asynchronous token refreshes by default. A new `disable_async_token_refresh` configuration option has been added to allow disabling this feature if necessary ([#952](#952)).
  To disable asynchronous token refresh, set the environment variable `DATABRICKS_DISABLE_ASYNC_TOKEN_REFRESH=true` or configure it within your configuration object.
  The previous `enable_experimental_async_token_refresh` option has been removed as asynchronous refresh is now the default behavior.
* Introduce support for Databricks Workload Identity Federation in GitHub workflows ([933](#933)).
  See README.md for instructions.
* [Breaking] Users running their workflows in GitHub Actions, which use Cloud native authentication and also have a `DATABRICKS_CLIENT_ID` and `DATABRICKS_HOST`
  environment variables set may see their authentication start failing due to the order in which the SDK tries different authentication methods.

### API Changes
* Added [w.alerts_v2](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/sql/alerts_v2.html) workspace-level service.
* Added `update_ncc_azure_private_endpoint_rule_public()` method for [a.network_connectivity](https://databricks-sdk-py.readthedocs.io/en/latest/account/settings/network_connectivity.html) account-level service.
* Added `update_endpoint_budget_policy()` and `update_endpoint_custom_tags()` methods for [w.vector_search_endpoints](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/vectorsearch/vector_search_endpoints.html) workspace-level service.
* Added `created_at`, `created_by` and `metastore_id` fields for `databricks.sdk.service.catalog.SetArtifactAllowlist`.
* Added `node_type_flexibility` field for `databricks.sdk.service.compute.EditInstancePool`.
* Added `page_size` and `page_token` fields for `databricks.sdk.service.compute.GetEvents`.
* Added `next_page_token` and `prev_page_token` fields for `databricks.sdk.service.compute.GetEventsResponse`.
* Added `node_type_flexibility` field for `databricks.sdk.service.compute.GetInstancePool`.
* Added `node_type_flexibility` field for `databricks.sdk.service.compute.InstancePoolAndStats`.
* Added `effective_performance_target` field for `databricks.sdk.service.jobs.RepairHistoryItem`.
* Added `performance_target` field for `databricks.sdk.service.jobs.RepairRun`.
* [Breaking] Added `network_connectivity_config` field for `databricks.sdk.service.settings.CreateNetworkConnectivityConfigRequest`.
* [Breaking] Added `private_endpoint_rule` field for `databricks.sdk.service.settings.CreatePrivateEndpointRuleRequest`.
* Added `domain_names` field for `databricks.sdk.service.settings.NccAzurePrivateEndpointRule`.
* Added `auto_resolve_display_name` field for `databricks.sdk.service.sql.CreateAlertRequest`.
* Added `auto_resolve_display_name` field for `databricks.sdk.service.sql.CreateQueryRequest`.
* Added `budget_policy_id` field for `databricks.sdk.service.vectorsearch.CreateEndpoint`.
* Added `custom_tags` and `effective_budget_policy_id` fields for `databricks.sdk.service.vectorsearch.EndpointInfo`.
* Added `create_clean_room`, `execute_clean_room_task` and `modify_clean_room` enum values for `databricks.sdk.service.catalog.Privilege`.
* Added `dns_resolution_error` and `gcp_denied_by_org_policy` enum values for `databricks.sdk.service.compute.TerminationReasonCode`.
* Added `disabled` enum value for `databricks.sdk.service.jobs.TerminationCodeCode`.
* Added `expired` enum value for `databricks.sdk.service.settings.NccAzurePrivateEndpointRuleConnectionState`.
* [Breaking] Changed `create_network_connectivity_configuration()` and `create_private_endpoint_rule()` methods for [a.network_connectivity](https://databricks-sdk-py.readthedocs.io/en/latest/account/settings/network_connectivity.html) account-level service with new required argument order.
* [Breaking] Changed `create_index()` method for [w.vector_search_indexes](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/vectorsearch/vector_search_indexes.html) workspace-level service to return `databricks.sdk.service.vectorsearch.VectorIndex` dataclass.
* [Breaking] Changed `delete_data_vector_index()` method for [w.vector_search_indexes](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/vectorsearch/vector_search_indexes.html) workspace-level service . HTTP method/verb has changed.
* [Breaking] Changed `delete_data_vector_index()` method for [w.vector_search_indexes](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/vectorsearch/vector_search_indexes.html) workspace-level service with new required argument order.
* [Breaking] Changed `databricks.sdk.service.vectorsearch.List` dataclass to.
* [Breaking] Changed `workload_size` field for `databricks.sdk.service.serving.ServedModelInput` to type `str` dataclass.
* [Breaking] Changed `group_id` field for `databricks.sdk.service.settings.NccAzurePrivateEndpointRule` to type `str` dataclass.
* [Breaking] Changed `target_services` field for `databricks.sdk.service.settings.NccAzureServiceEndpointRule` to type `databricks.sdk.service.settings.EgressResourceTypeList` dataclass.
* [Breaking] Changed `data_array` field for `databricks.sdk.service.vectorsearch.ResultData` to type `databricks.sdk.service.vectorsearch.ListValueList` dataclass.
* [Breaking] Changed waiter for [VectorSearchEndpointsAPI.create_endpoint](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/vectorsearch/vector_search_endpoints.html#databricks.sdk.service.vectorsearch.VectorSearchEndpointsAPI.create_endpoint) method.
* [Breaking] Removed `name` and `region` fields for `databricks.sdk.service.settings.CreateNetworkConnectivityConfigRequest`.
* [Breaking] Removed `group_id` and `resource_id` fields for `databricks.sdk.service.settings.CreatePrivateEndpointRuleRequest`.
* [Breaking] Removed `null_value` field for `databricks.sdk.service.vectorsearch.Value`.
* [Breaking] Removed `large`, `medium` and `small` enum values for `databricks.sdk.service.serving.ServedModelInputWorkloadSize`.
* [Breaking] Removed `blob`, `dfs`, `mysql_server` and `sql_server` enum values for `databricks.sdk.service.settings.NccAzurePrivateEndpointRuleGroupId`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants