Skip to content

Add role based access control for running, resuming and retrying workflows (#931) #939

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

eenblam
Copy link
Contributor

@eenblam eenblam commented May 8, 2025

This PR adds role-based access control to running, resuming and retrying workflows.

Documentation has been added to the Auth(n|z) page of the reference documentation.

As noted in the docs, this functionality is still in beta as the UI is not yet adapted to handle authorization rejections, nor does it hide/disable UI elements for starting/resuming workflows when a user isn't allowed to do so. Follow-up stories to implement these things are on the WFO Partner Code Sprint.

Version bumped to 4.1.0rc2.

Closes #931

Copy link

codspeed-hq bot commented May 8, 2025

CodSpeed Performance Report

Merging #939 will not alter performance

Comparing 931-feature-add-role-based-access-control-to-running-workflows (3843175) with main (718cd18)

Summary

✅ 12 untouched benchmarks

Copy link

codecov bot commented May 8, 2025

Codecov Report

Attention: Patch coverage is 86.36364% with 6 lines in your changes missing coverage. Please review.

Project coverage is 83.93%. Comparing base (718cd18) to head (3843175).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
orchestrator/api/api_v1/endpoints/processes.py 84.00% 3 Missing and 1 partial ⚠️
orchestrator/services/celery.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #939      +/-   ##
==========================================
+ Coverage   83.89%   83.93%   +0.04%     
==========================================
  Files         212      213       +1     
  Lines       10238    10261      +23     
  Branches     1009     1008       -1     
==========================================
+ Hits         8589     8613      +24     
- Misses       1379     1380       +1     
+ Partials      270      268       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@eenblam eenblam force-pushed the 931-feature-add-role-based-access-control-to-running-workflows branch from e4a5e3a to 6e33048 Compare May 8, 2025 16:28
@eenblam eenblam force-pushed the 931-feature-add-role-based-access-control-to-running-workflows branch from abc0074 to 6a96dae Compare May 21, 2025 00:08
@eenblam eenblam force-pushed the 931-feature-add-role-based-access-control-to-running-workflows branch 2 times, most recently from 7b508e0 to dbd45ed Compare May 23, 2025 19:31
Copy link
Contributor

@Mark90 Mark90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Left a few nitpicks and suggestions.

@eenblam
Copy link
Contributor Author

eenblam commented Jun 5, 2025

@Mark90 I wanted to add a bit more testing to cover retries via the API, but I think my approach is a bit flawed. Here's what I was attempting. Note that I'm intentionally trying to fail a step so that I can then test retry behavior:

def test_retry_authorization(test_client):
    def disallow(_: OIDCUserModel | None = None) -> bool:
        return False

    def allow(_: OIDCUserModel | None = None) -> bool:
        return True

    class ConfirmForm(FormPage):
        confirm: bool
    
    @inputstep("authorized_resume", assignee=Assignee.SYSTEM, resume_auth_callback=allow, retry_auth_callback=disallow)
    def authorized_resume(state):
        user_input = yield ConfirmForm
        return user_input.model_dump()

    @step("fails once")
    def fails_once(state):
        if not hasattr(fails_once,"called"):
            fails_once.called = False

        if not fails_once.called:
            fails_once.called = True
            raise RuntimeError("Failing intentionally, ignore")
        return {}

    @workflow("test_auth_workflow", target=Target.CREATE, authorize_callback=allow, retry_auth_callback=disallow)
    def test_auth_workflow():
        return init >> authorized_resume >> fails_once >> done

    with WorkflowInstanceForTests(test_auth_workflow, "test_auth_workflow"):
        # Creating workflow succeeds
        response = test_client.post("/api/processes/test_auth_workflow", json=[{}])
        assert HTTPStatus.CREATED == response.status_code
        process_id = response.json()["id"]
        # We're authorized to resume, but this will error, so we can retry
        response = test_client.put(f"/api/processes/{process_id}/resume", json=[{"confirm": True}])
        assert HTTPStatus.NO_CONTENT == response.status_code
        # We're authorized to retry, in spite of workflow's retry_auth_callback=disallow
        response = test_client.put(f"/api/processes/{process_id}/resume")
        assert HTTPStatus.NO_CONTENT == response.status_code

However, that final assertion is giving me a 422. In particular, it seems like this breaks the WorkflowInstanceForTests. The logs don't like the error, and the process seems to be dropped from the database.

ValueError: Failed to write failure step to process: process with PID f4e1a612-e1f0-4432-b109-16de73637dcd not found

and

sqlalchemy.orm.exc.ObjectDeletedError: Instance '<WorkflowTable at 0x11ce3dd00>' has been deleted, or its row is otherwise not present.

Is there a way to work around this? Or should I perhaps try setting up the workflow, advancing the process manually somehow, and setting process.last_status manually?

Note that I've already covered get_auth_callbacks via unit testing. I'm specifically wanting to test the behavior I"ve added to resume_process_endpoint.

This is otherwise ready for review. Thanks for having a look!

Copy link
Contributor

@mrijk mrijk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@Mark90 Mark90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small remaining remarks but looks good to me!

@Mark90
Copy link
Contributor

Mark90 commented Jun 5, 2025

@Mark90 I wanted to add a bit more testing to cover retries via the API, but I think my approach is a bit flawed. Here's what I was attempting. Note that I'm intentionally trying to fail a step so that I can then test retry behavior:

[...]

The 422 can be solved by changing the last resume to:

        response = test_client.put(f"/api/processes/{process_id}/resume", json=[{}])

However that doesn't solve the other problem. None of the existing unittests raise an exception within a test workflow so this is a new kind of test scenario. But a completely valid one, though!

I suspect the exception is causing the database transaction to be aborted... I'm not completely sure why though. I'd need more time to dig into this.

A workaround could be to test the API in isolation by mocking the actual start/resume process and the contents of the DB.

@eenblam eenblam changed the title Draft: 931 feature add role based access control to running workflows Add role based access control to running workflows (#931) Jun 11, 2025
@eenblam
Copy link
Contributor Author

eenblam commented Jun 11, 2025

@Mark90 I wanted to add a bit more testing to cover retries via the API, but I think my approach is a bit flawed. Here's what I was attempting. Note that I'm intentionally trying to fail a step so that I can then test retry behavior:
[...]

The 422 can be solved by changing the last resume to:

        response = test_client.put(f"/api/processes/{process_id}/resume", json=[{}])

However that doesn't solve the other problem. None of the existing unittests raise an exception within a test workflow so this is a new kind of test scenario. But a completely valid one, though!

I suspect the exception is causing the database transaction to be aborted... I'm not completely sure why though. I'd need more time to dig into this.

A workaround could be to test the API in isolation by mocking the actual start/resume process and the contents of the DB.

Fixed the 422 (it's now a 404 due to the other issue.) I committed the test as XFail, and I can open a ticket for fixing it when things are finalized and merged.

@Mark90 Mark90 force-pushed the 931-feature-add-role-based-access-control-to-running-workflows branch from 35336be to 3843175 Compare June 12, 2025 12:31
@Mark90 Mark90 merged commit 3c42411 into main Jun 12, 2025
16 checks passed
@Mark90 Mark90 deleted the 931-feature-add-role-based-access-control-to-running-workflows branch June 12, 2025 12:42
@Mark90 Mark90 changed the title Add role based access control to running workflows (#931) Add role based access control to running, resuming and retrying workflows (#931) Jun 12, 2025
@Mark90 Mark90 changed the title Add role based access control to running, resuming and retrying workflows (#931) Add role based access control for running, resuming and retrying workflows (#931) Jun 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Add Role based access control to running workflows - inputstep decorator
4 participants