How can I use a service principal to read and write to a database from Databricks?

This demo will show how you can use an Entra service principal to read and write to an Azure SQL DB or Azure Postgres database from a Databricks notebook.

Solution

Prerequisites
Azure SQL DB
Azure Postgres

Prerequisites

Create an Entra app registration

First, navigate to your tenant's Entra Id directory and add a new app registration. This will create an Entra Id service principal that you can use to authenticate to Azure resources:

Next, create a secret for the service principal.

Note: You'll need Tenant ID, service principal name, client ID and secret for the steps below.

The secret value will appear on the screen. Do not leave the page before recording it (it will never be displayed again). You may want to keep the browser tab open and complete the remaining steps in a second tab for convenience.

Create an Azure Key Vault

Create an Azure Key Vault in the same region

Assign yourself the Key Vault Secrets Officer role and the Azure Databricks application the Key Vault Secrets User role. (Databricks will access the key vault referenced in your secrets scope using the Databricks application's own service principal, which is unique to your tenant. You might expect a Unity Catalog-enabled workspace to use the workspace's managed identity to connect to the key vault, but unfortunately that's not the case.)

Add three secrets to the key vault: one for your tenant id, another for the service principal client Id (not the secret id) and a third for the secret value. You will need these to authenticate as the service principal in your Databricks notebook. Refer back to the secret value in the other browser tab as needed.

Create a Databricks workspace

Many of the new Databricks features like serverless, data lineage and managed identity support require Unity Catalog. Unity Catalog is normally enabled now by default when you create a new workspace and gives you the most authentication options when connecting to Azure resources like databases. The steps described in this demo should work for both Unity Catalog-enabled workspaces and workspaces using the legacy hive metastore.

Create a Databricks secret scope

Next, we need to create a secrets scope for our Databricks workspace. Open your workspace and add #secrets/createScope after the databricks instance url in your browser. It's case sensitive so be sure to use a capital S in 'createScope':

A page to configure a new secrets scope should appear. Give your secrets scope a name and paste the key vault uri (DNS Name) and resource id into the respective fields below. You can find these on the properties blade of your key vault.

Create a Databricks cluster

If you just want to read from a database with a service principal, you can use the serverless cluster. Just add the azure-identity library to the environment configuration.

If you also want to write to the database, however, you'll need to use a provisioned cluster.

After the cluster has been created, click on the cluster name in the list and then switch to the libraries tab to install azure-identity with PyPi

Azure SQL DB

Create Database

Create an Azure SQL database with Entra Id authentication. You can use the Adventure Works LT sample to pre-populate it with data if you like. For the purposes of this demo, we will only be working with the information_schema.

Then connect to the database using the Entra Id admin user and create a user for the service principal. You can use the Query editor in the portal or SQL Server Management Studio.

CREATE USER [sql-dbx-read-write-test-sp] FROM EXTERNAL PROVIDER;

Add the dbmanagedidentity user to database roles db_datareader, db_datawriter and db_ddladmin. This will allow the managed identity to read and write data to existing tables. Using overwrite mode will automatically drop and recreate a table before writing to it.

ALTER ROLE db_datareader
ADD MEMBER [sql-dbx-read-write-test-sp];

ALTER ROLE db_datawriter
ADD MEMBER [sql-dbx-read-write-test-sp];

ALTER ROLE db_ddladmin
ADD MEMBER [sql-dbx-read-write-test-sp];

Create a Databricks notebook

Paste the code below into the first cell of the notebook. Using the dbutils library, retrieve the tenant id, client id and service principal secret from the key vault referenced in the secrets scope and get an Entra token.

from azure.identity import ClientSecretCredential

tenant_id = dbutils.secrets.get(scope = "default", key = "tenant-id")
client_id = dbutils.secrets.get(scope = "default", key = "sql-dbx-read-write-test-sp-client-id")
client_secret = dbutils.secrets.get(scope = "default", key = "sql-dbx-read-write-test-sp-secret")

credential = ClientSecretCredential(tenant_id, client_id, client_secret)
token = credential.get_token("https://database.windows.net/.default").token

In the next cell, create a jdbc connection to the database and query a list of tables from the information schema.

jdbc_url = "jdbc:sqlserver://sql-xxxx.database.windows.net:1433;database=testdb"

connection_properties = {
    "accessToken": token,
    "driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver"
}

# Read from a table
df = spark.read.jdbc(url=jdbc_url, table="INFORMATION_SCHEMA.TABLES", properties=connection_properties)
df.show()

The output should look something like this

If you're using a provisioned cluster, you can run the code below to write to the database.

df.write.jdbc(
    url=jdbc_url,
    table="dbo.Test",
    mode="overwrite",
    properties=connection_properties
)

Running the code above on a serverless cluster will result in an error.

If you ask the Databricks AI assistant to Diagnose error, it will tell you that writing with jdbc isn't supported with serverless and you'll need to switch to a provisioned cluster instead.

To verify that the data was written as expected, you can read the table data into a dataframe

df_check = spark.read.jdbc(
    url=jdbc_url,
    table="dbo.Test",
    properties=connection_properties
)
df_check.show()

You can find the complete notebook here

Azure Postgres

Create Database

First create an Azure Database for PostgreSQL flexible server with Entra Id authentication. Then connect to the postgres database with the Entra ID administrator user.

In the example below, we connect using bash commands in Azure Cloud Shell in the portal. (You can use the PostgreSQL add-in in VS Code instead if you prefer.)

Set the environment variables in the Cloud Shell window.

export PGHOST=psql-xxxxx.postgres.database.azure.com
export PGUSER=user@domain.com
export PGPORT=5432
export PGDATABASE=postgres
export PGPASSWORD="$(az account get-access-token --resource https://ossrdbms-aad.database.windows.net --query accessToken --output tsv)"

Now simply type psql. The environment variables set above will be used automatically to connect.

psql

Once connected, run the statement below to create a user for the service principal

SELECT * FROM pgaadauth_create_principal('sql-dbx-read-write-test-sp',false, false);

Quit your connection to the postgres database and then open a new connection to your application database (testdb in our example). Grant the service principal user read and write privileges on all tables in the public schema

ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO "sql-dbx-read-write-test-sp";

GRANT ALL PRIVILEGES ON SCHEMA public TO "sql-dbx-read-write-test-sp";

Create a Databricks notebook

Paste the code below into the first cell of the notebook. Using the dbutils library, retrieve the tenant id, client id and service principal secret from the key vault referenced in the secrets scope and get an Entra token. The url used to get the token is different from the one we used above to authenticate to Azure SQL DB.

from azure.identity import ClientSecretCredential

sp_name = "sql-dbx-read-write-test-sp"
tenant_id = dbutils.secrets.get(scope = "default", key = "tenant-id")
client_id = dbutils.secrets.get(scope = "default", key = "sql-dbx-read-write-test-sp-client-id")
client_secret = dbutils.secrets.get(scope = "default", key = "sql-dbx-read-write-test-sp-secret")

credential = ClientSecretCredential(tenant_id, client_id, client_secret)
token = credential.get_token("https://ossrdbms-aad.database.windows.net/.default").token

In the next cell, add the following code to connect to and read data from the database into a dataframe. You need to specify the service principal name as the user and pass the token for the password. Once the connection is made, however, we read and write to the database the same way we did for SQL.

jdbc_url = "jdbc:postgresql://psql-xxxxxxx.postgres.database.azure.com:5432/testdb"

connection_properties = {
    "user": sp_name,
    "password": token,
    "driver": "org.postgresql.Driver",
    "ssl": "true",
    "sslfactory": "org.postgresql.ssl.NonValidatingFactory"
}

# Read from a table
df = spark.read.jdbc(url=jdbc_url, table="information_schema.tables", properties=connection_properties)
display(df)

Add a new cell and paste in the code below to write the contents of the dataframe to a new table in the database. The only difference to the SQL code above is that we are using the public schema in place of dbo.

df.write.jdbc(
    url=jdbc_url,
    table="public.Test",
    mode="overwrite",
    properties=connection_properties
)

Running the code above on a serverless cluster will result in an error.

If you ask the Databricks AI assistant to Diagnose error, it will tell you that writing with jdbc isn't supported with serverless and you'll need to switch to a provisioned cluster instead.

Finally, let's add a cell to read the table contents into a dataframe to confirm that the data was written.

df_check = spark.read.jdbc(
    url=jdbc_url,
    table="public.Test",
    properties=connection_properties
)
df_check.show()

You can download the complete notebook here

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.vscode		.vscode
databricks		databricks
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

How can I use a service principal to read and write to a database from Databricks?

Solution

Prerequisites

Create an Entra app registration

Create an Azure Key Vault

Create a Databricks workspace

Create a Databricks secret scope

Create a Databricks cluster

Azure SQL DB

Create Database

Create a Databricks notebook

Azure Postgres

Create Database

Create a Databricks notebook

About

Uh oh!

Releases

Packages

Languages

License

benijake/databricks-read-write-database-sp

Folders and files

Latest commit

History

Repository files navigation

How can I use a service principal to read and write to a database from Databricks?

Solution

Prerequisites

Create an Entra app registration

Create an Azure Key Vault

Create a Databricks workspace

Create a Databricks secret scope

Create a Databricks cluster

Azure SQL DB

Create Database

Create a Databricks notebook

Azure Postgres

Create Database

Create a Databricks notebook

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages