Persist refresh token to disk #135

tokoko · 2025-08-21T06:53:19Z

tokoko
Aug 21, 2025

Hey, first of all thanks for your work on this. We're currently trying out Polaris and we've also been trying to come up with the best way to set up a good "human" auth flow. I came across this repo a bit too late, so we implemented our own solution as a generic oidc cli client tool that externally initiates a device code flow and then persists tokens to disk (~/.oidc/<realm>/<client>/..). Basically the idea is to run oidc login <realm> <client> before running your code similar to how one would run kinit and persist credentials to KRB5CCNAME with kerberos.

I'm thinking of either switching to dremio auth manager or allowing users to use either solutions. One major difference (with device code flow) between them would be that even with somewhat long-lived token lifetimes, the user will have to authenticate with keycloak every time spark session is initialized when using auth manager (that's the case, right??).

What do you think about optionally allowing refresh tokens (at least during device code flow) to be persisted to disk (to a user-configured location) so that it can be reused every time a manager is reinitialized. Right before the flow starts, manager would check if refresh location is set, whether it's populated or not and also if the refresh token present is still valid. If all three conditions hold there would be no need for a prompt.

Another upside would be that the flow would work even if the initial refresh token is populated externally by something other than auth manager, but that's probably beside the point.

snazy · 2025-08-21T07:41:24Z

snazy
Aug 21, 2025
Collaborator

I think the proposal is totally legit and very interesting! Thanks for raising this idea!

TL;DR is (IIUC):

store refresh tokens per client application instance
allow (interactive) login separate from client application runtime

As the proposal is generally about refresh (tokens), it feels to make sense to not restrict the functionality to a specific flow, but include the other flows as well, at least the ones that require (user) interaction, like the auth-code flow.

While this proposal is (mostly) about refresh tokens, you also mentioned oidc login, to perform the actual (interactive) login separate from the actual client/"data" application. This seems to be a natural addition.

The only reservation, not a blocker, but something to seriously think about, is the protection of the persisted sensitive data. I think that should "mandate":

restrictive (POSIX) file permissions (think: chmod 600)
"state of the art" encryption (AES-256?)
The latter is IMO the most important piece of the puzzle. As the library needs to read and write, I'm not sure whether an asymmetric algo would help.

Another technical aspect is that multiple clients using the same IDP+realm+client combinations can run concurrently (aka: mitigate races on the persisted tokens). Having more than one process using the same "refresh token store" would lead to, say, "interesting" behavior. One way out could be using "classic lock files" with "delete on close". I suspect it's fine to specify that users have to configure one "refresh token store" (analogous to KDB5CCNAME) per application? We'd need to define a safe default behavior for this case though - I guess "fail fast" instead of "do the interactive login" is the best way in those cases.

The actual file-name/path convention is very much an implementation concern. I could imagine that people use different IDPs with the same realm name (for example: test + prod). But that's solvable.

1 reply

tokoko Aug 21, 2025
Author

store refresh tokens per client application instance

To be more precise, I think persistence should happen on IDP+realm+client+scope level to make behavior less "interesting". existing token that lacks the scope you need is probably no good.

As the proposal is generally about refresh (tokens), it feels to make sense to not restrict the functionality to a specific flow, but include the other flows as well, at least the ones that require (user) interaction, like the auth-code flow.

yes, all of this applies equally to auth-code flow as well.

One way out could be using "classic lock files" with "delete on close".

I think lock files should be easy enough to implement to be safe, although the most common workflow would be sequential, do a login either externally or on manager initialization and do a normal refresh with an added step of persistence. Of course people are always bound to defy expectations :)

restrictive (POSIX) file permissions (think: chmod 600)

this makes sense

"state of the art" encryption (AES-256?)

not sure what would be the benefits, but I'm also fine with that as well.

The actual file-name/path convention is very much an implementation concern. I could imagine that people use different IDPs with the same realm name (for example: test + prod). But that's solvable.

Yes, we will probably manage to come up with a convention. I think there are 2 broad approaches we can take:

We can come up with a protocol of sorts. This includes the directory layout of tokens inside the token store, rules around lock files and so on. In this case we will have a config something like token-store-dir.
We can simply allow the user to specify a config like refresh-token-persistence-file and then it will become the user's responsibility to make sure that the file path follows some sort of convention.

Obviously the first option is more user-friendly, but in case we are serious about allowing other tools to interoperate with the same cache, then the "protocol" should probably be very well-defined as other tools (hypothetical oidc login cli) would need to follow the same.

adutra · 2025-08-21T08:42:45Z

adutra
Aug 21, 2025
Collaborator

Hi @tokoko thanks for raising this discussion!

The idea of providing refresh tokens to the auth manager comes up regularly. We even started a previous discussion about this idea: #131.

Unfortunately the TLDR is that refresh tokens are not meant to be shared, and doing so will very likely result in an 401 response from the IDP.

If I read between the lines, it seems that your problem is that you may be using e.g. remote S3 request signing, and therefore the Spark worker nodes all need to perform a Device Code grant, which is obviously not possible. See this discussion #132 for possible ways to overcome this problem.

My personal preference would be one of the following approaches:

Worker nodes would be configured to perform a token exchange and exchange the original Device Code grant token for a service account token;
Worker nodes would be configured to perform a client credentials grant, and obtain "plain" service account tokens.

The difficult part is how to configure the worker nodes differently from the main driver node where the RESTCatalog lives.

One solution is to use the SessionContext. But that's not possible with Spark SQL, where everything needs to be configured via command-line configuration options. But it might be possible with Spark Shell, or a PySpark script.

That's whee your idea of persisting configuration or credentials to disk also could turn out to be a good solution. It was already suggested in #78. It would allow the worker nodes to read their configuration from files and merge it with the main catalog properties, thus overriding e.g. the grant type and/or credentials to use.

Of course what @snazy said remains very valid: we need to be extremely careful when writing sensitive data to a file on disk.

1 reply

tokoko Aug 21, 2025
Author

Hi, thanks for the detailed response. I have seen those discussions and I purposefully created a new one rather than add to those because I think this one targets a different use case. I'm talking about sharing an initial "login" between applications (they can be distributed or not, that's beside the point here) rather than sharing credentials between driver and workers as part of a single application. My naive plan to take care of "distributed" problem was to generate an access token (with a relatively long lifetime) on the driver before starting an application (with oidc login) and then setting the token as a static spark configuration that driver and executors can reuse. That's obviously not ideal and the approaches you provided are obviously better alternatives. I'm just highlighting that in my mind persisted refresh token is NOT a solution to that problem. (Also, I'm talking about persisting to local disk that will most likely not be accessible to executors anyway).

IOW, I don't want to deal with users who were used to running a single kinit once a day and are now annoyed with me because I'm forcing them to re-authenticate on each new pyspark script command. That's the only issue I'm targeting here.

adutra · 2025-08-21T09:38:16Z

adutra
Aug 21, 2025
Collaborator

OK I guess I misunderstood your original problem, my apologies.

If the problem is just about "resuming" a previous session without requiring users to login again, then I agree that persisting both access and refresh tokens to a secure location on a local disk makes sense.

I must warn though: this solution would always be a bit brittle for two reasons:

If a worker node ever needs to create an auth manager for itself, it's unclear how it would obtain its tokens.
If the main driver node needs to create a new auth session, for instance for a specific table, reading the tokens from a file on disk would likely cause a 401 response from the IDP because two auth sessions would attempt to use the same refresh token. I guess we can mitigate this and decide that only the main auth session would read from the file.

0 replies

adutra · 2025-08-27T10:17:03Z

adutra
Aug 27, 2025
Collaborator

@tokoko the more it goes, the more I like your proposal 😄 – especially after our parallel discussion in #132.

Let me digress a bit to explain how static tokens are currently supported today:

For now, you can pass static tokens in 3 config options:

rest.auth.oauth2.token: a simple access token to be used in lieu of fetching an initial token from the IDP.
rest.auth.oauth2.token-exchange.subject-token: a subject token to be used in the token exchange grant
rest.auth.oauth2.token-exchange.actor-token: an actor token to be used in the token exchange grant

For now, you cannot pass refresh tokens via config. (I'm still not sure we should allow this, but I'm not firmly opposed to it either.)

Also, using rest.auth.oauth2.token makes token refreshes impossible because the token is 100% static and is never updated. I personally don't like much this option, but it's there because it's convenient.

When using rest.auth.oauth2.token-exchange.(subject|actor)-token , token refreshes are still possible, but only as long as the subject/actor tokens are still valid (or if they are offline tokens).

Now back to your suggestion: reading tokens from files seems like a good idea to me. I am not a Spark internals expert at all, but I am glad to know that there is a way to implement this as you explained in #132.

With that in mind, I would envision something like this:

Implement exporting tokens to a "tokens file"
- Note: I think this would be different from Ability to read configuration from system properties, environment variables, and credentials files on disk #78.
- rest.auth.oauth2.tokens-file.export-path would be the path to the file to export (write); if present, the manager would persist each token it fetches from the IDP in that file.
- rest.auth.oauth2.tokens-file.export-refresh-token would control whether to export the refresh token as well (since it may not be shareable). The default would be false.
Implement reading tokens from a tokens file
- rest.auth.oauth2.tokens-file.import-path would be the path to the file to import (read)
- if present, it would take precedence over static tokens from rest.auth.oauth2.token and initial token fetches from the IDP.

I think this could even be implemented as a pseudo-grant e.g. urn:dremio:iceberg:auth-manager:grant-type:tokens-file. This would make reading a tokens file just another supported token grant type.

The tokens file format could be JSON.

With that:

If you want the manager to export its tokens:

rest.auth.oauth2.tokens-file.export-path=/path/to/tokens.json
rest.auth.oauth2.tokens-file.export-refresh-token=true

if you want the manager to read the tokens from the file for the initial grant instead of contacting the IDP:

rest.auth.oauth2.grant-type=urn:dremio:iceberg:auth-manager:grant-type:tokens-file
rest.auth.oauth2.tokens-file.import-path=/path/to/tokens.json

The manager would be able to refresh the token from the file, if the file contains a reusable refresh token. If the refresh fails, it would read the file again to check if new tokens were written.

Finally, this would also work for token exchanges. E.g. if you want the subject token to be sourced from a tokens file, you would do stg like:

rest.auth.oauth2.token-exchange.subject.grant-type=urn:dremio:iceberg:auth-manager:grant-type:tokens-file
rest.auth.oauth2.token-exchange.subject.tokens-file.import-path=/path/to/tokens.json

You could even go crazy and source both subject and actor tokens:

rest.auth.oauth2.token-exchange.subject.grant-type=urn:dremio:iceberg:auth-manager:grant-type:tokens-file
rest.auth.oauth2.token-exchange.subject.tokens-file.import-path=/path/to/subject-tokens.json
rest.auth.oauth2.token-exchange.actor.grant-type=urn:dremio:iceberg:auth-manager:grant-type:tokens-file
rest.auth.oauth2.token-exchange.actor.tokens-file.import-path=/path/to/actor-tokens.json

6 replies

tokoko Aug 27, 2025
Author

I think we're generally on the same page. I have 2 comments, though:

I think this could even be implemented as a pseudo-grant e.g. urn:dremio:iceberg:auth-manager:grant-type:tokens-file. This would make reading a tokens file just another supported token grant type.

Introducing a pseudo-grant (rather than a toggle) feels like a right solution, but it will come with a downside that the user will have to make a choice upfront about which grant type to use (either tokens file or a device code for example). If you presuppose that initial grant is obtained externally (by hypothetical oidc login thingy) that's not an issue, but if the goal is to obtain a grant once with some pyspark script and then simply reuse the tokens after, forcing users to switch grant-type back and forth is probably a bad user experience.

Having said that, another (probably better) solution there might be to allow specifying grants as a comma-separated chain, such that urn:dremio:iceberg:auth-manager:grant-type:tokens-file,device_code config would result in the behavior I outlined above. Manager would try file cache first, fail and fall back to device code after. This might even have a side-effect of solving the problem of configuring different grant types for drivers and executors in the distbributed scenario(?).

Implement exporting tokens to a "tokens file"

I like the configs and the file format as well. For me the big question (that I also outlined above while answering @snazy) is whether we allow specifying the location as a single file only or as a directory with some sort of opinionated layout: for example concat of '{base_directory}/{idp_host}/{realm}/{client}/{scope}/tokens' that also comes with a complementary tokens.lock file for managing safe updates. Once again, this is probably also more of an UX issue that can be postponed, but I'm interested what are your thoughts.

adutra Aug 27, 2025
Collaborator

if the goal is to obtain a grant once with some pyspark script and then simply reuse the tokens after, forcing users to switch grant-type back and forth is probably a bad user experience.

Ah right; I forgot the oidc login use case. I agree my suggestion of a pseudo-grant does not fit well here. I will think a bit more about this. Maybe it's not a good idea after all. As for specifying many grants, I think I'd prefer to stick to just one for now.

whether we allow specifying the location as a single file only or as a directory with some sort of opinionated layout

These are very good ideas! I'm not sure that we can break down the hierarchy up to the realm/client/scope level though. What we can try is to obtain the hash code of the configuration used to fetch the token, and use that. Basically, something like: {base_directory}/{config_hash}/tokens.json. But, I'm not sure how to handle hash collisions (if ever), and also we'd need to prune that directory from time to time.

adutra Aug 28, 2025
Collaborator

So I've got another proposal. I'm giving up on the pseudo-grant idea.

Configuration

Introduce the following new configuration options:

rest.auth.oauth2.store.enabled=true|false # default false
rest.auth.oauth2.store.path=/path/to/store/
rest.auth.oauth2.store.mode=read|write|read-write # default read-write

"oidc login" use case example:

rest.auth.oauth2.grant-type=authorization_code
rest.auth.oauth2.client-id=...
rest.auth.oauth2.client-secret=...
rest.auth.oauth2.store.path=/path/to/store/

Distributed engine use case - driver node example:

rest.auth.oauth2.grant-type=client_credentials
rest.auth.oauth2.client-id=...
rest.auth.oauth2.client-secret=...
rest.auth.oauth2.store.enabled=true
rest.auth.oauth2.store.path=/path/to/store/
rest.auth.oauth2.store.mode=write

Distributed engine use case - worker node example:

rest.auth.oauth2.grant-type=client_credentials
rest.auth.oauth2.client-id=...
rest.auth.oauth2.client-secret=...
rest.auth.oauth2.store.enabled=true
rest.auth.oauth2.store.path=/path/to/store/
rest.auth.oauth2.store.mode=read

Token Acquisition Logic

Start token acquisition:
a) If existing in-memory tokens, token refreshes enabled and refresh token present: refresh current tokens
b) Else: acquire new tokens
1. If static token present in config and not expired: use static token from config
2. Else if store enabled, configured in read or read-write mode, and matching tokens are found and not expired: read from tokens store
3. Else: fetch new tokens from IDP using the configured grant type
When new tokens are acquired:
a) Store the new tokens in memory
b) If token store is enabled in write or read-write mode, write the new tokens to the token store
c) Schedule the next token renewal based on the new token's expiration time

Store Layout

I'd suggest:

{base_directory}/{token_endpoint}/{config_hash}/tokens.json

A single json file containing all the tokens and using the same hierarchy could work too.

adutra Aug 28, 2025
Collaborator

To source the subject token from the tokens store in a token exchange grant, the worker node could do this:

rest.auth.oauth2.grant-type=urn:dremio:iceberg:auth-manager:grant-type:token-exchange
rest.auth.oauth2.client-id=...
rest.auth.oauth2.client-secret=...
rest.auth.oauth2.token-exchange.subject.store.enabled=true
rest.auth.oauth2.token-exchange.subject.store.path=/path/to/store/
rest.auth.oauth2.token-exchange.subject.store.mode=read

tokoko Aug 28, 2025
Author

Looks great. Just a couple of points that we might have to come back to later on:

Else: fetch new tokens from IDP using the configured grant type

If for whatever reason (misconfiguration or some sort of failure) the flow comes to this step on an executor which is non-interactive and grant-type is configured as device code or authorization code, I think we should try to instantly throw an exception rather than stall the process waiting for user input.

{base_directory}/{token_endpoint}/{config_hash}/tokens.json

I like the config_hash idea, but the rules for hash generation should be transparent and somewhat stable. it shouldn't be next to impossible for another tool to replicate it.

The use of config hash for read/write also means that even if there's a token in the store that could have been valid, but isn't identical (for example if it contains more scopes than requested) we wouldn't be able to use it. I guess that's probably a feature, not a bug, but I still have a feeling it might result in strange edge case confusions. One thing that comes to mind is default scopes configured in idp. The request that contains the default scope would result in a different hash from the one that doesn't even though both requests are virtually identical (but the requester can't really know that of course).

Else if store enabled, configured in read or read-write mode, and matching tokens are found and not expired: read from tokens store

This is not a deal-breaker, but it might be a good idea to have a fallback mechanism even if the token found in the store is not expired, but fails later on when actually used. There might be various reasons for it that only become apparent after initial use, for example misconfigured iss due to how the token was generated.

Persist refresh token to disk #135

Uh oh!

tokoko Aug 21, 2025

Replies: 4 comments · 8 replies

Uh oh!

snazy Aug 21, 2025 Collaborator

Uh oh!

tokoko Aug 21, 2025 Author

Uh oh!

adutra Aug 21, 2025 Collaborator

Uh oh!

tokoko Aug 21, 2025 Author

Uh oh!

adutra Aug 21, 2025 Collaborator

Uh oh!

adutra Aug 27, 2025 Collaborator

Uh oh!

tokoko Aug 27, 2025 Author

Uh oh!

adutra Aug 27, 2025 Collaborator

Uh oh!

adutra Aug 28, 2025 Collaborator

Configuration

Token Acquisition Logic

Store Layout

Uh oh!

Uh oh!

adutra Aug 28, 2025 Collaborator

Uh oh!

tokoko Aug 28, 2025 Author

tokoko
Aug 21, 2025

Replies: 4 comments 8 replies

snazy
Aug 21, 2025
Collaborator

tokoko Aug 21, 2025
Author

adutra
Aug 21, 2025
Collaborator

tokoko Aug 21, 2025
Author

adutra
Aug 21, 2025
Collaborator

adutra
Aug 27, 2025
Collaborator

tokoko Aug 27, 2025
Author

adutra Aug 27, 2025
Collaborator

adutra Aug 28, 2025
Collaborator

adutra Aug 28, 2025
Collaborator

tokoko Aug 28, 2025
Author