Human-based flows with distributed engines #132

adutra · 2025-04-10T08:38:30Z

adutra
Apr 10, 2025
Collaborator

The driver process is generally responsible for initializing the RESTCatalog instance, and thus interacts the most with the catalog server. Human-based flows work generally well in this setup, as long as the driver process is interacting with a human operator.

Things get more complicated though when FileIO instances, created by executors, need to interact with the catalog server. This can happen e.g. when using S3 request signing. Each signer will have its own AuthManager, but this time, there won't be any human operator available, so the flow will timeout and fail.

As of now, I think human-based flows are completely incompatible with S3 request signing and also object storage credentials refreshing. Basically any interaction between executors and the catalog server would fail.

We should investigate ways to improve this.

adutra · 2025-04-25T12:41:52Z

adutra
Apr 25, 2025
Collaborator Author

A few ideas:

The driver node holding the catalog instance could share its access token and refresh token with the worker nodes.
- Pros: easy
- Cons: sharing a refresh token is not great; would not work if refresh token rotation is enforced (that is, each refresh token can only be used once)
The driver node holding the catalog instance could share it's access token only, then the worker nodes would exchange that token
- Pros: less prone to failures or security concerns
- Cons: needs additional configuration so that worker nodes can perform the token exchange appropriately. Worker nodes may need to create or obtain an actor token.

3 replies

tokoko Aug 27, 2025

Can you expand on the token exchange solution? My understanding is that during token exchange an access token is exchanged for another access token, right? You wouldn't be able to generate a brand new refresh token, so you would still have no way to refresh whatever token you acquired. what am I missing?

adutra Aug 27, 2025
Collaborator Author

The token exchange response can contain a refresh token. The specs say:

A refresh token can be issued in cases where the client of the token exchange needs the ability to access a resource even when the original credential is no longer valid (e.g., user-not-present or offline scenarios where there is no longer any user entertaining an active session with the client).

https://datatracker.ietf.org/doc/html/rfc8693#name-successful-response

My reading of this is that we could envision the following scenario:

The driver node acquires an access token with offline access enabled.
The driver nodes passes its access token to worker nodes.
Worker nodes are able to exchange that access token for another one, and would also be granted a refresh token.
Worker nodes are able to keep their tokens afresh even if the original access token is expired.

However... 😬

Keycloak does not seem to allow offline access for token exchanges:

https://www.keycloak.org/securing-apps/token-exchange#_standard-token-exchange-details

I don't know what other IDPs are doing, honestly.

In the case offline access is not possible, then the token exchange scenario would still be possible imho, but would be bound to the lifetime of the original access token, unfortunately.

adutra Aug 27, 2025
Collaborator Author

Update: it might be possible after all to exchange an offline token for another access token in Keycloak: the documentation is not 100% clear but it seems that requesting an access token is possible, what is not possible is requesting a refresh token.

IOW this should work:

grant_type=urn:ietf:params:oauth:grant-type:token-exchange
subject_token=<offline token>
subject_token_type=urn:ietf:params:oauth:token-type:access_token
requested_token_type=urn:ietf:params:oauth:token-type:access_token # OK

But this wouldn't:

grant_type=urn:ietf:params:oauth:grant-type:token-exchange
subject_token=<offline token>
subject_token_type=urn:ietf:params:oauth:token-type:access_token
requested_token_type=urn:ietf:params:oauth:token-type:refresh_token # WRONG!

I will take some time soon to test this.

adutra · 2025-04-25T14:03:26Z

adutra
Apr 25, 2025
Collaborator Author

In any case, a change in Iceberg Core is required. More specifically, the following method needs to be modified:

https://github.com/apache/iceberg/blob/e56ec11215e7b9fec6ffd1ea435e008852454db0/core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java#L993-L1002

This method is the "entry point" for table-scoped FileIO instances, before they get distributed to worker nodes. We'd need the ability to modify the properties to "inject" the additional tokens and/or modify the configuration. Example:

  private FileIO tableFileIO(
          SessionContext context, Map<String, String> config, AuthSession tableSession, List<Credential> storageCredentials) {
    if (config.isEmpty() && ioBuilder == null && storageCredentials.isEmpty()) {
      return io; // reuse client and io since config/credentials are the same
    }

    Map<String, String> fullConf = RESTUtil.merge(properties(), config);
    fullConf = RESTUtil.merge(fullConf, tableSession.remoteFileIOProperties());
    
    return newFileIO(context, fullConf, storageCredentials);
  }

0 replies

adutra · 2025-05-12T14:32:21Z

adutra
May 12, 2025
Collaborator Author

There might be a 3rd idea that wouldn't require changes in Iceberg Core: the table configuration is created by merging the table-specific properties onto the catalog properties:

Map<String, String> fullConf = RESTUtil.merge(properties(), config); // properties() returns the catalog properties

We can notice that session context properties, if present, are not added to the final table configuration.

This may have been done on purpose. One could leverage this peculiarity and reserve human-based flows to session contexts exclusively. Example:

Auth Session Level	Contents
Catalog	Client ID + client secret, optionally scope, etc. (implicit grant type: `client_credentials`)
Session Context	Grant type: `authorization_code` (or `device_code`)
Table	(empty)

With the above configuration, one would need to create a catalog with a non-empty Session Context. This cannot be donne with configuration only (e.g. Spark SQL wouldn't work), but I assume it can be easily done in a Spark Shell session or using a Python script.

In that case the Spark driver process holding the RESTSessionCatalog instance would use authorization_code to authenticate the user (defined by the Session Context properties), while all other Spark nodes would use client_credentials (defined by the Catalog properties, propagated to FileIO instances).

(to be tested)

0 replies

adutra · 2025-06-02T15:04:01Z

adutra
Jun 2, 2025
Collaborator Author

Another way to enable human-based flows only on the driver node is to leverage environment variables or credentials files on disk, cf. #78.

E.g. the driver node could have a credentials file specifying Authorization Code grant type, while worker nodes would either inherit the default grant type from the catalog properties, or also read environment vars or credentials files, using different a grant type and/or client ID+secret.

0 replies

tokoko · 2025-08-27T04:54:14Z

tokoko
Aug 27, 2025

I've been thinking about this some and would like to run some of the ideas by you. This is engine-specific (meaning spark) which is less than ideal, but still something. This also presupposes that auth managers on executors somehow are already aware that they are running on executor nodes rather than on a driver.

If the goal is to enable refresh on every node and an initial grant can only ever have one valid refresh token at a time, it makes sense to delegate access token generation to the driver node exclusively. In other words, refresh token lives only on the driver and whenever an executor needs it's access token refreshed, it should reach out to the driver for it. Obviously some sort of network communication is bound to break down depending on the deployment model, but fortunately spark does have a plugin interface that allows executor plugins to send rpc messages (PluginContext.ask) to their driver counterpart and receive responses (DriverPlugin.receive). Plugins are also able to read spark conf, so they could be made aware of all the spark config that is used to configure auth managers themselves.

The last problem to solve would be setting up communication between spark plugin components and their respective auth managers. Persisting tokens to disk (#135) could come in handy here. ExecutorPlugin (knowing from conf that an auth manager needs a token refreshed every 15 minutes) would start a background thread that sends an rpc to the driver periodically and persists the resulting token to disk. Executor AuthManager then interprets it's own 15 min refresh as a signal to "refresh" it's token by reading it from disk once again. Similarly DriverPlugin would have no way of knowing how the refresh token it uses for token generation has been acquired, it would simply read the refresh token from disk that has been put there by Driver AuthManager.

Curious what you think about it.

1 reply

adutra Aug 27, 2025
Collaborator Author

This also presupposes that auth managers on executors somehow are already aware that they are running on executor nodes rather than on a driver.

This is actually possible, there is a special method that is only invoked on executor nodes:

https://github.com/apache/iceberg/blob/9f266917b658931f3b704cd9c50b3f5d0da90cb7/core/src/main/java/org/apache/iceberg/rest/auth/AuthManager.java#L75

If the goal is to enable refresh on every node and an initial grant can only ever have one valid refresh token at a time, it makes sense to delegate access token generation to the driver node exclusively.

Yes that's basically the idea. The devil is in the details 😅

The last problem to solve would be setting up communication between spark plugin components and their respective auth managers. Persisting tokens to disk (#135) could come in handy here.

Yes, definitely. Let's discuss the details in #135 .

Executor AuthManager then interprets it's own 15 min refresh as a signal to "refresh" it's token by reading it from disk once again.

We may not even need a periodic hot-reload: the manager could simply read the file whenever it needs new tokens. I will explain in #135.

Human-based flows with distributed engines #132

Uh oh!

adutra Apr 10, 2025 Collaborator

Replies: 5 comments · 4 replies

Uh oh!

Uh oh!

adutra Apr 25, 2025 Collaborator Author

Uh oh!

tokoko Aug 27, 2025

Uh oh!

adutra Aug 27, 2025 Collaborator Author

Uh oh!

adutra Aug 27, 2025 Collaborator Author

Uh oh!

adutra Apr 25, 2025 Collaborator Author

Uh oh!

adutra May 12, 2025 Collaborator Author

Uh oh!

adutra Jun 2, 2025 Collaborator Author

Uh oh!

tokoko Aug 27, 2025

Uh oh!

adutra Aug 27, 2025 Collaborator Author

adutra
Apr 10, 2025
Collaborator

Replies: 5 comments 4 replies

adutra
Apr 25, 2025
Collaborator Author

adutra Aug 27, 2025
Collaborator Author

adutra Aug 27, 2025
Collaborator Author

adutra
Apr 25, 2025
Collaborator Author

adutra
May 12, 2025
Collaborator Author

adutra
Jun 2, 2025
Collaborator Author

tokoko
Aug 27, 2025

adutra Aug 27, 2025
Collaborator Author