Skip to content

feat(auth): add allow_sign_up environment variable for OpenID Connect #7267

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 34 commits into from

Conversation

RogerHYang
Copy link
Contributor

@RogerHYang RogerHYang commented Apr 24, 2025

resolves #7255

Add OAuth2 Sign-Up Control

This PR introduces OAuth2 authentication support with granular control over user sign-up behavior. Key changes include:

  1. Sign-Up Control:

    • Added PHOENIX_OAUTH2_{IDP_NAME}_ALLOW_SIGN_UP environment variable for each OAuth2 identity provider
    • When set to false, only existing users can sign in via OAuth2
    • New users attempting to sign in are redirected to login page with error message
    • Existing users with password authentication cannot switch to OAuth2
  2. Database Migration:

    • Added new auth_method column to users table to distinguish between LOCAL and OAUTH2 authentication
    • Implemented migration script to populate auth_method based on existing credentials
    • Added CHECK constraints to ensure data integrity:
      • Valid auth_method values ('LOCAL' or 'OAUTH2')
      • Proper credential combinations (password_hash for LOCAL, OAuth2 credentials for OAUTH2)
    • Removed legacy constraints replaced by new auth_method column
    • Added comprehensive migration tests for both upgrade and downgrade paths
    • Ensured compatibility with both SQLite and PostgreSQL backends
  3. Core Authentication Changes:

    • Added OAuth2 authentication method alongside existing local authentication
    • Enhanced user model to support OAuth2 credentials (client_id and user_id)
    • Added auth_method field to distinguish between LOCAL and OAUTH2 users
    • Implemented proper handling of OAuth2 user credentials and validation
  4. Testing and Validation:

    • Added comprehensive test coverage for OAuth2 authentication flows
    • Test cases for both allowed and disallowed sign-up scenarios
    • Validation of OAuth2 credentials and user information
    • Error handling tests for various edge cases
    • Integration tests for complete OAuth2 authentication flow
  5. Security and Error Handling:

    • Proper validation of OAuth2 client IDs and user IDs
    • Secure handling of OAuth2 tokens and credentials
    • Clear error messages for authentication failures
    • Protection against unauthorized authentication method changes

The changes maintain backward compatibility with existing local authentication while adding robust support for OAuth2 authentication. The implementation includes proper validation, error handling, and comprehensive test coverage for various authentication scenarios, with particular focus on the new ability to control OAuth2 sign-up behavior through environment variables.

@github-project-automation github-project-automation bot moved this to 📘 Todo in phoenix Apr 24, 2025
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Apr 24, 2025
Copy link
Contributor

@axiomofjoy axiomofjoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted with @RogerHYang. I'm thinking we ought to weaken the exactly_one_auth_method constraint on the users table to avoid the dummy values in the OAuth-related columns.

One option:

  • auth_type enum column for LOCAL and OAUTH2
  • constraint that if auth_type is LOCAL the password hash should be non-null and the OAuth2-related columns should be null
  • constraint that if auth_type is OAUTH2 the password hash should be null

Comment on lines 213 to 216
if username := self.username:
object.__setattr__(self, "username", username.strip())
if profile_picture_url := self.profile_picture_url:
object.__setattr__(self, "profile_picture_url", profile_picture_url.strip())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if username := self.username:
object.__setattr__(self, "username", username.strip())
if profile_picture_url := self.profile_picture_url:
object.__setattr__(self, "profile_picture_url", profile_picture_url.strip())
if (username := self.username) and (stripped_username := username.strip()):
object.__setattr__(self, "username", stripped_username)
if (profile_picture_url := self.profile_picture_url) and (stripped_profile_picture_url := profile_picture_url.strip()):
object.__setattr__(self, "profile_picture_url", stripped_profile_picture_url)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not clear to me why these local variables are more clear, since they are just a rearrangement of letters

stripped_profile_picture_url := profile_picture_url.strip()

password_hash=password_hash,
password_salt=salt,
)
if input.password:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of relying on the password being blank, can we make add a authType enum to the input with LOCAL or OAUTH2 values and make the password an optional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we need to make this more declarative. The code block below is not self evident and requires tribal knowledge.

The UI is also going to have to build affordances for this for it to actually work so using an enum makes the most sense here to me.

Copy link
Contributor

@mikeldking mikeldking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think since this is basically just additive and low-risk it makes sense to merge this. I do still have a bit of concern around the UX in the UI where it's not entirely gonna be obvious that this new method if invitation should be used.

Wouldn't in this case the entire system have to be disabled for signups such that the admins are forced to invite only via "TBD" flow? Obviously these things don't need to be solved in this PR but I think it's worth brainstorming the invitation flow a bit more from a UX perspective.

Comment on lines 1 to 2
TBD_OAUTH2_CLIENT_ID_PREFIX = "TBD_OAUTH2_CLIENT_ID_"
TBD_OAUTH2_USER_ID_PREFIX = "TBD_OAUTH2_USER_ID_"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow what these are?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I get it moving down - but I think you should add a docstring here sincde as raw constants it's hard to tell what they do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, these are dummy values that are being added to satisfy the exactly_one_auth_method check constraint:

CheckConstraint(

I think it makes sense to relax that constraint since it too restrictive in a world where an OAuth2 user can be added before their OAuth2 client and user IDs have been recorded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the dummy values become unnecessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like you would prefer to keep the constraint and use dummy values @mikeldking?

password_hash=password_hash,
password_salt=salt,
)
if input.password:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we need to make this more declarative. The code block below is not self evident and requires tribal knowledge.

The UI is also going to have to build affordances for this for it to actually work so using an enum makes the most sense here to me.

@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels May 2, 2025
Copy link
Contributor

@axiomofjoy axiomofjoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there's an empty file constants.py

Comment on lines 821 to 822
if match := re.match(r"(?P<email>.*)\((?P<auth_method>\w+)\)$", email_part, re.IGNORECASE):
email_addr = match.group("email").strip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if match := re.match(r"(?P<email>.*)\((?P<auth_method>\w+)\)$", email_part, re.IGNORECASE):
email_addr = match.group("email").strip()
if match := re.match(r"(?P<email>.+)\((?P<auth_method>\w+)\)$", email_part, re.IGNORECASE):
email_addr = match.group("email").strip()

Non-empty email??

user = await _get_user(
session,
oauth2_client_id=oauth2_client_id,
idp_user_id=user_info.idp_user_id,
)
if user is None:
user = await _create_user(session, oauth2_client_id=oauth2_client_id, user_info=user_info)
elif user.email != user_info.email:
if user.email != user_info.email:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks like it's intended to handle a case where the user is newly created, but there is a mismatch between the newly created user's email and the email from the OIDC token. But that shouldn't be possible since we use that exact email when creating the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, this one is unintentional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how it got changed. good catch

Comment on lines 314 to 315
If the user has temporary OAuth2 credentials (prefixed with TBD_OAUTH2_CLIENT_ID_ or
TBD_OAUTH2_USER_ID_), these are updated with the actual credentials from the OAuth2 provider.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments look out of date.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right. very sharp eyes

Comment on lines 49 to 54
def __post_init__(self) -> None:
if self.auth_method is AuthMethod.OAUTH2:
if self.password:
raise ValueError("Password is not allowed for OAuth2 authentication")
elif not self.password:
raise ValueError("Password is required for local authentication")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def __post_init__(self) -> None:
if self.auth_method is AuthMethod.OAUTH2:
if self.password:
raise ValueError("Password is not allowed for OAuth2 authentication")
elif not self.password:
raise ValueError("Password is required for local authentication")
def __post_init__(self) -> None:
if self.auth_method is AuthMethod.OAUTH2:
if self.password:
raise BadRequest("Password is not allowed for OAuth2 authentication")
elif not self.password:
raise BadRequest("Password is required for local authentication")

Given some of the issues that have been coming into community around inscrutable error messages, I want to make sure these APIs make it back to the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sg. thx

Comment on lines 72 to 80
# No index added: small user base makes full scans efficient
# Scaling considerations:
# - < 1,000 users: current approach optimal
# - 1,000-10,000: monitor query performance
# - > 10,000: consider adding index if auth queries are frequent
# Add index when:
# - Auth-related queries take >100ms
# - User count reaches ~5,000 with frequent auth queries
# - Many concurrent users or complex auth-related joins
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nix?


# Drop the old constraints that are no longer needed
# These are replaced by the new auth_method column and its CHECK constraint
batch_op.drop_constraint("oauth2_client_id_and_user_id", type_="check")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still want this constraint, no?

batch_op.alter_column("auth_method", nullable=False, existing_nullable=True)

# Add CHECK constraint to ensure only valid values are allowed
batch_op.create_check_constraint("auth_method", "auth_method IN ('LOCAL', 'OAUTH2')")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
batch_op.create_check_constraint("auth_method", "auth_method IN ('LOCAL', 'OAUTH2')")
batch_op.create_check_constraint("valid_auth_method", "auth_method IN ('LOCAL', 'OAUTH2')")

Comment on lines 103 to 117
batch_op.create_check_constraint(
"oauth2_client_id_and_user_id",
"(oauth2_client_id IS NULL) = (oauth2_user_id IS NULL)",
)
batch_op.create_check_constraint(
"exactly_one_auth_method",
"(password_hash IS NULL) != (oauth2_client_id IS NULL)",
)

# Drop the CHECK constraint and column
# Order matters: drop constraint before dropping column
# This prevents any constraint violations during the process
batch_op.drop_constraint("auth_method", type_="check")
batch_op.drop_constraint("auth_method_password", type_="check")
batch_op.drop_column("auth_method")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ordering doesn't mirror the up migration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok i can change, but this one seems to be ok because the migration tests passes, so the order is not an issue here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only time the ordering would matter would be if the migration fails partway and we need to comment out a portion of the down migration to run manually. I think it's worth keeping the order mirrored to facilitate that in a worst-case scenario.

Comment on lines +1075 to +1081
def __init__(self, **kwargs: Any) -> None:
if "auth_method" not in kwargs:
if kwargs.get("password_hash") and kwargs.get("password_salt"):
kwargs["auth_method"] = "LOCAL"
else:
kwargs["auth_method"] = "OAUTH2"
super().__init__(**kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this explicit at the call site rather than implicit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only for testing situations. The helper classes below are intended for normal use, which accomplishes the objective that you're probably referring to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I grok. Where do we rely on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok i can remove it. I was just too lazy to go through and update all the tests

@RogerHYang RogerHYang marked this pull request as draft May 9, 2025 21:50
@RogerHYang RogerHYang changed the base branch from main to feat/version-10 May 16, 2025 22:53
@RogerHYang RogerHYang marked this pull request as ready for review May 16, 2025 22:55
@RogerHYang RogerHYang requested review from a team as code owners May 16, 2025 22:55
@RogerHYang RogerHYang requested review from a team as code owners May 19, 2025 14:26
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@RogerHYang
Copy link
Contributor Author

superseded by #7582

@RogerHYang RogerHYang closed this May 19, 2025
@github-project-automation github-project-automation bot moved this from 📘 Todo to ✅ Done in phoenix May 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[ENHANCEMENT]: add allow_sign_up boolean flag for OpenID Connect
3 participants