Add role based access control for running, resuming and retrying workflows (#931) #939

eenblam · 2025-05-08T00:43:15Z

This PR adds role-based access control to running, resuming and retrying workflows.

Documentation has been added to the Auth(n|z) page of the reference documentation.

As noted in the docs, this functionality is still in beta as the UI is not yet adapted to handle authorization rejections, nor does it hide/disable UI elements for starting/resuming workflows when a user isn't allowed to do so. Follow-up stories to implement these things are on the WFO Partner Code Sprint.

Version bumped to 4.1.0rc2.

Closes #931

codspeed-hq · 2025-05-08T00:47:20Z

CodSpeed Performance Report

Merging #939 will not alter performance

_{Comparing 931-feature-add-role-based-access-control-to-running-workflows (3843175) with main (718cd18)}

Summary

✅ 12 untouched benchmarks

codecov · 2025-05-08T00:48:46Z

Codecov Report

Attention: Patch coverage is 86.36364% with 6 lines in your changes missing coverage. Please review.

Project coverage is 83.93%. Comparing base (718cd18) to head (3843175).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
orchestrator/api/api_v1/endpoints/processes.py	84.00%	3 Missing and 1 partial ⚠️
orchestrator/services/celery.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #939      +/-   ##
==========================================
+ Coverage   83.89%   83.93%   +0.04%     
==========================================
  Files         212      213       +1     
  Lines       10238    10261      +23     
  Branches     1009     1008       -1     
==========================================
+ Hits         8589     8613      +24     
- Misses       1379     1380       +1     
+ Partials      270      268       -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

orchestrator/workflow.py

orchestrator/services/processes.py

orchestrator/workflow.py

Mark90

Looking good! Left a few nitpicks and suggestions.

orchestrator/api/api_v1/endpoints/processes.py

orchestrator/services/processes.py

orchestrator/workflow.py

eenblam · 2025-06-05T00:38:46Z

@Mark90 I wanted to add a bit more testing to cover retries via the API, but I think my approach is a bit flawed. Here's what I was attempting. Note that I'm intentionally trying to fail a step so that I can then test retry behavior:

def test_retry_authorization(test_client):
    def disallow(_: OIDCUserModel | None = None) -> bool:
        return False

    def allow(_: OIDCUserModel | None = None) -> bool:
        return True

    class ConfirmForm(FormPage):
        confirm: bool
    
    @inputstep("authorized_resume", assignee=Assignee.SYSTEM, resume_auth_callback=allow, retry_auth_callback=disallow)
    def authorized_resume(state):
        user_input = yield ConfirmForm
        return user_input.model_dump()

    @step("fails once")
    def fails_once(state):
        if not hasattr(fails_once,"called"):
            fails_once.called = False

        if not fails_once.called:
            fails_once.called = True
            raise RuntimeError("Failing intentionally, ignore")
        return {}

    @workflow("test_auth_workflow", target=Target.CREATE, authorize_callback=allow, retry_auth_callback=disallow)
    def test_auth_workflow():
        return init >> authorized_resume >> fails_once >> done

    with WorkflowInstanceForTests(test_auth_workflow, "test_auth_workflow"):
        # Creating workflow succeeds
        response = test_client.post("/api/processes/test_auth_workflow", json=[{}])
        assert HTTPStatus.CREATED == response.status_code
        process_id = response.json()["id"]
        # We're authorized to resume, but this will error, so we can retry
        response = test_client.put(f"/api/processes/{process_id}/resume", json=[{"confirm": True}])
        assert HTTPStatus.NO_CONTENT == response.status_code
        # We're authorized to retry, in spite of workflow's retry_auth_callback=disallow
        response = test_client.put(f"/api/processes/{process_id}/resume")
        assert HTTPStatus.NO_CONTENT == response.status_code

However, that final assertion is giving me a 422. In particular, it seems like this breaks the WorkflowInstanceForTests. The logs don't like the error, and the process seems to be dropped from the database.

ValueError: Failed to write failure step to process: process with PID f4e1a612-e1f0-4432-b109-16de73637dcd not found

and

sqlalchemy.orm.exc.ObjectDeletedError: Instance '<WorkflowTable at 0x11ce3dd00>' has been deleted, or its row is otherwise not present.

Is there a way to work around this? Or should I perhaps try setting up the workflow, advancing the process manually somehow, and setting process.last_status manually?

Note that I've already covered get_auth_callbacks via unit testing. I'm specifically wanting to test the behavior I"ve added to resume_process_endpoint.

This is otherwise ready for review. Thanks for having a look!

orchestrator/api/api_v1/endpoints/processes.py

orchestrator/workflow.py

mrijk

LGTM

test/unit_tests/api/test_processes.py

Mark90

Some small remaining remarks but looks good to me!

Mark90 · 2025-06-05T12:11:10Z

@Mark90 I wanted to add a bit more testing to cover retries via the API, but I think my approach is a bit flawed. Here's what I was attempting. Note that I'm intentionally trying to fail a step so that I can then test retry behavior:

[...]

The 422 can be solved by changing the last resume to:

        response = test_client.put(f"/api/processes/{process_id}/resume", json=[{}])

However that doesn't solve the other problem. None of the existing unittests raise an exception within a test workflow so this is a new kind of test scenario. But a completely valid one, though!

I suspect the exception is causing the database transaction to be aborted... I'm not completely sure why though. I'd need more time to dig into this.

A workaround could be to test the API in isolation by mocking the actual start/resume process and the contents of the DB.

orchestrator/api/api_v1/endpoints/processes.py

eenblam · 2025-06-11T23:43:34Z

@Mark90 I wanted to add a bit more testing to cover retries via the API, but I think my approach is a bit flawed. Here's what I was attempting. Note that I'm intentionally trying to fail a step so that I can then test retry behavior:
[...]

The 422 can be solved by changing the last resume to:
        response = test_client.put(f"/api/processes/{process_id}/resume", json=[{}])
However that doesn't solve the other problem. None of the existing unittests raise an exception within a test workflow so this is a new kind of test scenario. But a completely valid one, though!

I suspect the exception is causing the database transaction to be aborted... I'm not completely sure why though. I'd need more time to dig into this.

A workaround could be to test the API in isolation by mocking the actual start/resume process and the contents of the DB.

Fixed the 422 (it's now a 404 due to the other issue.) I committed the test as XFail, and I can open a ticket for fixing it when things are finalized and merged.

Unlike new_process, this can be checked immediately in the request handler. Policy priorities specified via workflow and steps are resolved via get_auth_callbacks.

pboers1988 reviewed May 8, 2025

View reviewed changes

orchestrator/workflow.py Outdated Show resolved Hide resolved

eenblam force-pushed the 931-feature-add-role-based-access-control-to-running-workflows branch from e4a5e3a to 6e33048 Compare May 8, 2025 16:28

eenblam force-pushed the 931-feature-add-role-based-access-control-to-running-workflows branch from abc0074 to 6a96dae Compare May 21, 2025 00:08

eenblam commented May 21, 2025

View reviewed changes

orchestrator/services/processes.py Outdated Show resolved Hide resolved

eenblam commented May 21, 2025

View reviewed changes

orchestrator/workflow.py Show resolved Hide resolved

eenblam force-pushed the 931-feature-add-role-based-access-control-to-running-workflows branch 2 times, most recently from 7b508e0 to dbd45ed Compare May 23, 2025 19:31

Mark90 reviewed Jun 4, 2025

View reviewed changes

mrijk reviewed Jun 5, 2025

View reviewed changes

orchestrator/api/api_v1/endpoints/processes.py Show resolved Hide resolved

mrijk reviewed Jun 5, 2025

View reviewed changes

orchestrator/workflow.py Show resolved Hide resolved

mrijk approved these changes Jun 5, 2025

View reviewed changes

Mark90 reviewed Jun 5, 2025

View reviewed changes

test/unit_tests/api/test_processes.py Outdated Show resolved Hide resolved

Mark90 approved these changes Jun 5, 2025

View reviewed changes

Mark90 reviewed Jun 11, 2025

View reviewed changes

orchestrator/api/api_v1/endpoints/processes.py Show resolved Hide resolved

Mark90 mentioned this pull request Jun 11, 2025

[Feature]: Add Role based access control to running workflows - Process Detail Retry #956

Open

eenblam changed the title ~~Draft: 931 feature add role based access control to running workflows~~ Add role based access control to running workflows (#931) Jun 11, 2025

Ben Elam added 9 commits June 12, 2025 14:29

Add utils/auth.py

b2c7aed

Implement RBAC for resume and retry workflows

5015f00

Unlike new_process, this can be checked immediately in the request handler. Policy priorities specified via workflow and steps are resolved via get_auth_callbacks.

Implement RBAC for processes started via celery

4008a41

Fix bug and linting issues

8666e91

Refactor get_auth_callbacks; add tests

cb85551

Use functional approach for get_auth_callbacks

e2df480

Parametrize new tests

47ddd2f

Add docs for workflow authorization

13436e9

Improve process filtering on resume endpoint

c6ae7d6

Ben Elam and others added 3 commits June 12, 2025 14:29

Add xfailed test for resume_process_endpoint

80d13c2

Change RBAC bold text to a warning banner

28761f6

Bump version to 4.1.0rc2

3843175

Mark90 force-pushed the 931-feature-add-role-based-access-control-to-running-workflows branch from 35336be to 3843175 Compare June 12, 2025 12:31

Mark90 merged commit 3c42411 into main Jun 12, 2025
16 checks passed

Mark90 deleted the 931-feature-add-role-based-access-control-to-running-workflows branch June 12, 2025 12:42

Mark90 changed the title ~~Add role based access control to running workflows (#931)~~ Add role based access control to running, resuming and retrying workflows (#931) Jun 12, 2025

Mark90 changed the title ~~Add role based access control to running, resuming and retrying workflows (#931)~~ Add role based access control for running, resuming and retrying workflows (#931) Jun 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add role based access control for running, resuming and retrying workflows (#931) #939

Add role based access control for running, resuming and retrying workflows (#931) #939

Uh oh!

eenblam commented May 8, 2025 •

edited by Mark90

Loading

Uh oh!

codspeed-hq bot commented May 8, 2025 •

edited

Loading

Uh oh!

codecov bot commented May 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mark90 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eenblam commented Jun 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

mrijk left a comment

Uh oh!

Uh oh!

Mark90 left a comment

Uh oh!

Mark90 commented Jun 5, 2025

Uh oh!

Uh oh!

eenblam commented Jun 11, 2025

Uh oh!

Uh oh!

Uh oh!

Add role based access control for running, resuming and retrying workflows (#931) #939

Add role based access control for running, resuming and retrying workflows (#931) #939

Uh oh!

Conversation

eenblam commented May 8, 2025 • edited by Mark90 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #939 will not alter performance

Summary

Uh oh!

codecov bot commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mark90 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eenblam commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mrijk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Mark90 left a comment

Choose a reason for hiding this comment

Uh oh!

Mark90 commented Jun 5, 2025

Uh oh!

Uh oh!

eenblam commented Jun 11, 2025

Uh oh!

Uh oh!

Uh oh!

eenblam commented May 8, 2025 •

edited by Mark90

Loading

codspeed-hq bot commented May 8, 2025 •

edited

Loading

codecov bot commented May 8, 2025 •

edited

Loading

eenblam commented Jun 5, 2025 •

edited

Loading