Skip to content

AWS IAM: lakefs IDP interface #8994

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 24 commits into
base: master
Choose a base branch
from
Open

AWS IAM: lakefs IDP interface #8994

wants to merge 24 commits into from

Conversation

ItamarYuran
Copy link
Contributor

@ItamarYuran ItamarYuran commented Apr 28, 2025

Closes #8992

To make further IAM auth implementations easier, I moved JWT retrieval logic into lakeFS and put it behind a small interface.

@ItamarYuran ItamarYuran added the exclude-changelog PR description should not be included in next release changelog label Apr 28, 2025
Copy link

E2E Test Results - DynamoDB Local - Local Block Adapter

13 passed, 1 skipped

Copy link

github-actions bot commented Apr 28, 2025

E2E Test Results - Quickstart

12 passed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The package structure should allow extending it without refactor.
In the current form it's a mix of generic and AWS primitives.

For example:
The type IdentityTokenInfo is AWS specific but it's under the generic package externalidp.
The problem is that when we add azure token info, how would you call it? IdentityTokenInfoAzure? Then you'd refactor this one to IdentityTokenInfoAWS?

We should have a clear separation between external principal login VS AWS specific.

What do I actually suggest is 1 of 2 options, Im fine with either (The first one probably easier):

  1. Add AWS to everything that is related to AWS, i.e IdentityTokenInfo -> AWSIdentityTokenInfo
  2. Create a sub package to contain all AWS specific code under i.e externalidp/awsidp

return resp, err
}

func getJWT(params *AWSIAMParams, serverEndpoint string, httpClient *http.Client) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no point in another function getJWT if it's only used in Login put the code there, otherwise it's bloating the code

TokenTTL time.Duration
}

func NewAWSProvider(params AWSIAMParams, serverEndpoint string, httpClient *http.Client) *AWSProvider {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*http.Client is (a) confusing name (b) too low level.

I suggest the following refactor:

  1. Create a minimal interface (easier to test, less dependencies) with a single function ExternalPrincipalLoginWithResponse.

Note: that's already a part of the generated lakeFS client ClientWithResponsesInterface, but just this func.

type ExternalPrincipalLoginClient interface {
  ExternalPrincipalLoginWithResponse(ctx context.Context, body ExternalPrincipalLoginJSONRequestBody, reqEditors ...RequestEditorFn) (*ExternalPrincipalLoginResponse, error)
}
  1. Refactor the existing NewAWSProvider function (help with testing and mock):
NewAWSProviderWithClient(params AWSIAMParams, client ExternalPrincipalLoginClient) *AWSProvider { 
   ...
}
  1. Add additional New func: NewAWSProvider that will be used in everest:
func NewAWSProvider(params AWSIAMParams, lakeFSHost string) (*AWSProvider, error) {
	client, err := apigen.NewClientWithResponses(
		serverEndpoint,
		apigen.WithHTTPClient(httpClient),
	)
	if err != nil {
		return nil, err
	}
        return NewAWSProviderWithClient(params, client), nil
}

Token string
}

type IDPProvider interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDP == Identity Provider -> IDPProvider == IdentityProviderProvider 😋

Suggested change
type IDPProvider interface {
type Provider interface {

Since the package name contains the name idp and external, in go you don't need to duplicate the name, i.e this is enough:

import (
    ".../externalidp"
)

// usage is clear
externalidp.Provider

return externalPrincipalLoginResp.JSON200.Token, nil
}

func getIdentityToken(ctx context.Context, params *AWSIAMParams) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid making this func private, so we can easily go-get and test.
Also it should be part of the AWSProvider struct, e.g struct method.

Comment on lines 160 to 181
queryParams := parsedURL.Query()
credentials := queryParams.Get(AWSAuthCredentialKey)
splitedCreds := strings.Split(credentials, "/")
calculatedRegion := splitedCreds[2]
identityTokenInfo := IdentityTokenInfo{
Method: "POST",
Host: parsedURL.Host,
Region: calculatedRegion,
Action: AWSAuthAction,
Date: queryParams.Get(AWSAuthDateKey),
ExpirationDuration: queryParams.Get(AWSAuthExpiresKey),
AccessKeyID: creds.AccessKeyID,
Signature: queryParams.Get(AWSAuthSignatureKey),
SignedHeaders: strings.Split(queryParams.Get(AWSAuthSignedHeadersKey), ";"),
Version: queryParams.Get(AWSAuthVersionKey),
Algorithm: queryParams.Get(AWSAuthAlgorithmKey),
SecurityToken: queryParams.Get(AWSAuthSecurityTokenKey),
}

marshaledIdentityTokenInfo, _ := json.Marshal(identityTokenInfo)
encodedIdentityTokenInfo := base64.StdEncoding.EncodeToString(marshaledIdentityTokenInfo)
return encodedIdentityTokenInfo, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is a good candidate to a separate function, it'll be easier to test it, i.e:

func NewIdentityTokenInfoFromURL(presignedURL string) (*IdentityTokenInfo, string, error)

Note: returning both the encoded string output of the identityTokenInfo and the struct itself - it allows to test it properly, writing a test for encoded string is harder if you want to verify specific fields.

Comment on lines 120 to 126
if err != nil {
return "", err
}
if externalPrincipalLoginResp == nil || externalPrincipalLoginResp.JSON200 == nil {
return "", ErrRetrievingToken
}
return externalPrincipalLoginResp.JSON200.Token, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Check http issue after regular error with ResponseAsError
  2. Return struct AuthenticationToken not just the token, it contains more info and allows extending
  3. No need for those nil checks below
import "github.com/treeverse/lakefs/pkg/api/helpers"

/// 
if err != nil {
    ... 
}
err = helpers.ResponseAsError(res)
if err != nil {
	return nil, err
}
return res.JSON200

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkg/authentication/external_idp/aws_client.go

Comment on lines 47 to 53
type LoginResponse struct {
Token string
}

type IDPProvider interface {
Login() (LoginResponse, error)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either rename the file, or extract these to a general idp.go file

@Isan-Rivkin Isan-Rivkin changed the title IAM mount: lakefs IDP interface AWS IAM: lakefs IDP interface May 5, 2025
Copy link
Contributor

@Isan-Rivkin Isan-Rivkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-request when ready

@ItamarYuran ItamarYuran requested a review from Isan-Rivkin May 11, 2025 09:08
Copy link
Contributor

@Isan-Rivkin Isan-Rivkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say when ok

@ItamarYuran ItamarYuran requested a review from Isan-Rivkin May 12, 2025 08:54
@ItamarYuran ItamarYuran requested a review from Isan-Rivkin May 12, 2025 12:04
@@ -195,8 +196,9 @@ const (
defaultMaxRetryInterval = 30 * time.Second
defaultMinRetryInterval = 200 * time.Millisecond

defaultTokenTTL = 3600 * time.Second
defaultURLPresignTTL = 60 * time.Second
defaultTokenTTL = 0 * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove, shouldn't be used

@@ -679,5 +681,6 @@ func initConfig() {
viper.SetDefault("local.skip_non_regular_files", false)
viper.SetDefault("credentials.provider.aws_iam.token_ttl_seconds", defaultTokenTTL)
viper.SetDefault("credentials.provider.aws_iam.url_presign_ttl_seconds", defaultURLPresignTTL)
viper.SetDefault("credentials.provider.aws_iam.refresh_interval", defaultRefreshInterval)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason to define an explicit default here, the default of this type is already 0.
The struct instance will contain the default of time.Duration if it's not defined here, and it's 0.

Comment on lines +200 to +201
defaultURLPresignTTL = 60 * time.Second
defaultRefreshInterval = 300 * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note on those defaults
you know that other services re-using this configuration will need to define their own defaults, right?
Viper instantiation is specific to lakectl, not other binaries like everest that has their own viper instance initiating defaults.

I'm OK to keeping these if you'd like now because those are good defaults that we will want when we implement this logic in lakectl, but they will not be reflected in any other binary such as everest.

CredentialTimeFormat = "20060102"
DefaultSTSLoginExpire = 15 * time.Minute

mimLengthSplitCreds = 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo mim?

Comment on lines +40 to +42
DefaultSTSLoginExpire = 15 * time.Minute

mimLengthSplitCreds = 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of go and out go styling, encapsulate consts in logical groups, all of the above are query params, and those are just random consts

Suggested change
DefaultSTSLoginExpire = 15 * time.Minute
mimLengthSplitCreds = 3
const (
DefaultSTSLoginExpire = 15 * time.Minute
mimLengthSplitCreds = 3
)

"github.com/google/uuid"
)

var ErrInvalidCredentialsFormat = errors.New("missing required parts in credentials")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var ErrInvalidCredentialsFormat = errors.New("missing required parts in credentials")
var ErrInvalidCredentialsFormat = errors.New("missing required parts in query param X-Amz-Credential")

credentials := queryParams.Get(AuthCredentialKey)
splitedCreds := strings.Split(credentials, "/")
if len(splitedCreds) < mimLengthSplitCreds {
return nil, fmt.Errorf("invalid credentials format: %w", ErrInvalidCredentialsFormat)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, fmt.Errorf("invalid credentials format: %w", ErrInvalidCredentialsFormat)
return nil, fmt.Errorf("min length for query param not '%d' ('%s'): %w",mimLengthSplitCreds, splitedCreds, ErrInvalidCredentialsFormat)

if len(splitedCreds) < mimLengthSplitCreds {
return nil, fmt.Errorf("invalid credentials format: %w", ErrInvalidCredentialsFormat)
}
calculatedRegion := splitedCreds[2]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be consistent with access to splittedCreds, either set both splitedCreds[2] and splitedCreds[0] into dedicated variables, or both directly into AWSIdentityTokenInfo

&AWSIdentityTokenInfo{
    // ...
    Region: splittedCreds[2],
    AccessKeyID: splitedCreds[0],
    // ...

req.Header.Add(header, value)
}
queryParams := req.URL.Query()
queryParams.Set(AuthExpiresKey, fmt.Sprintf("%d", int(ttl.Seconds())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that key explicitly here?

return presign.URL, err
}

func RetrieveCredentials(ctx context.Context, cfg *aws.Config) (*aws.Credentials, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd strongly consider removing this function.
After our ping-pong's it's been reduced to really 1 line of code creds, err := cfg.Credentials.Retrieve(ctx) , not used in lakeFS.
It has no extra logic so the tests are essentially just testing AWS sdk cfg.Credentials.Retrieve function that we don't need to test.

In other words, a whole lot of bloat to the code for a single func call that is not implemented by us anyway.

require.Contains(t, url, "amazonaws.com") // ensures it's AWS
require.Contains(t, url, "X-Amz-Signature") // ensures it's signed
}
func TestGeneratePresignedURL_Integration(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test func name should match PresignGetCallerIdentityFromAuthParams

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exclude-changelog PR description should not be included in next release changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IAM AWS: implement AWS IAM functionality in lakeFS
3 participants