Replies: 4 comments 8 replies
-
I think the proposal is totally legit and very interesting! Thanks for raising this idea! TL;DR is (IIUC):
As the proposal is generally about refresh (tokens), it feels to make sense to not restrict the functionality to a specific flow, but include the other flows as well, at least the ones that require (user) interaction, like the auth-code flow. While this proposal is (mostly) about refresh tokens, you also mentioned The only reservation, not a blocker, but something to seriously think about, is the protection of the persisted sensitive data. I think that should "mandate":
Another technical aspect is that multiple clients using the same IDP+realm+client combinations can run concurrently (aka: mitigate races on the persisted tokens). Having more than one process using the same "refresh token store" would lead to, say, "interesting" behavior. One way out could be using "classic lock files" with "delete on close". I suspect it's fine to specify that users have to configure one "refresh token store" (analogous to The actual file-name/path convention is very much an implementation concern. I could imagine that people use different IDPs with the same realm name (for example: test + prod). But that's solvable. |
Beta Was this translation helpful? Give feedback.
-
Hi @tokoko thanks for raising this discussion! The idea of providing refresh tokens to the auth manager comes up regularly. We even started a previous discussion about this idea: #131. Unfortunately the TLDR is that refresh tokens are not meant to be shared, and doing so will very likely result in an 401 response from the IDP. If I read between the lines, it seems that your problem is that you may be using e.g. remote S3 request signing, and therefore the Spark worker nodes all need to perform a Device Code grant, which is obviously not possible. See this discussion #132 for possible ways to overcome this problem. My personal preference would be one of the following approaches:
The difficult part is how to configure the worker nodes differently from the main driver node where the One solution is to use the That's whee your idea of persisting configuration or credentials to disk also could turn out to be a good solution. It was already suggested in #78. It would allow the worker nodes to read their configuration from files and merge it with the main catalog properties, thus overriding e.g. the grant type and/or credentials to use. Of course what @snazy said remains very valid: we need to be extremely careful when writing sensitive data to a file on disk. |
Beta Was this translation helpful? Give feedback.
-
OK I guess I misunderstood your original problem, my apologies. If the problem is just about "resuming" a previous session without requiring users to login again, then I agree that persisting both access and refresh tokens to a secure location on a local disk makes sense. I must warn though: this solution would always be a bit brittle for two reasons:
|
Beta Was this translation helpful? Give feedback.
-
@tokoko the more it goes, the more I like your proposal 😄 – especially after our parallel discussion in #132. Let me digress a bit to explain how static tokens are currently supported today: For now, you can pass static tokens in 3 config options:
For now, you cannot pass refresh tokens via config. (I'm still not sure we should allow this, but I'm not firmly opposed to it either.) Also, using When using Now back to your suggestion: reading tokens from files seems like a good idea to me. I am not a Spark internals expert at all, but I am glad to know that there is a way to implement this as you explained in #132. With that in mind, I would envision something like this:
I think this could even be implemented as a pseudo-grant e.g. The tokens file format could be JSON. With that: If you want the manager to export its tokens: rest.auth.oauth2.tokens-file.export-path=/path/to/tokens.json
rest.auth.oauth2.tokens-file.export-refresh-token=true if you want the manager to read the tokens from the file for the initial grant instead of contacting the IDP: rest.auth.oauth2.grant-type=urn:dremio:iceberg:auth-manager:grant-type:tokens-file
rest.auth.oauth2.tokens-file.import-path=/path/to/tokens.json The manager would be able to refresh the token from the file, if the file contains a reusable refresh token. If the refresh fails, it would read the file again to check if new tokens were written. Finally, this would also work for token exchanges. E.g. if you want the subject token to be sourced from a tokens file, you would do stg like: rest.auth.oauth2.token-exchange.subject.grant-type=urn:dremio:iceberg:auth-manager:grant-type:tokens-file
rest.auth.oauth2.token-exchange.subject.tokens-file.import-path=/path/to/tokens.json You could even go crazy and source both subject and actor tokens: rest.auth.oauth2.token-exchange.subject.grant-type=urn:dremio:iceberg:auth-manager:grant-type:tokens-file
rest.auth.oauth2.token-exchange.subject.tokens-file.import-path=/path/to/subject-tokens.json
rest.auth.oauth2.token-exchange.actor.grant-type=urn:dremio:iceberg:auth-manager:grant-type:tokens-file
rest.auth.oauth2.token-exchange.actor.tokens-file.import-path=/path/to/actor-tokens.json |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey, first of all thanks for your work on this. We're currently trying out Polaris and we've also been trying to come up with the best way to set up a good "human" auth flow. I came across this repo a bit too late, so we implemented our own solution as a generic oidc cli client tool that externally initiates a device code flow and then persists tokens to disk (
~/.oidc/<realm>/<client>/..
). Basically the idea is to runoidc login <realm> <client>
before running your code similar to how one would runkinit
and persist credentials toKRB5CCNAME
with kerberos.I'm thinking of either switching to dremio auth manager or allowing users to use either solutions. One major difference (with device code flow) between them would be that even with somewhat long-lived token lifetimes, the user will have to authenticate with keycloak every time spark session is initialized when using auth manager (that's the case, right??).
What do you think about optionally allowing refresh tokens (at least during device code flow) to be persisted to disk (to a user-configured location) so that it can be reused every time a manager is reinitialized. Right before the flow starts, manager would check if refresh location is set, whether it's populated or not and also if the refresh token present is still valid. If all three conditions hold there would be no need for a prompt.
Another upside would be that the flow would work even if the initial refresh token is populated externally by something other than auth manager, but that's probably beside the point.
Beta Was this translation helpful? Give feedback.
All reactions