Skip to content

Add application category #853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
May 15, 2025
Merged

Add application category #853

merged 13 commits into from
May 15, 2025

Conversation

THardy98
Copy link
Contributor

@THardy98 THardy98 commented May 1, 2025

What was changed

Added a category field to ApplicationError, allowing users to configure the severity (and corresponding logging/metrics behavior) of their ApplicationError.

Activity errors that are BENIGN application errors do not log.

Why?

Part of benign exceptions work.

  1. Closes Apply application failure logging and metrics behaviour according to ApplicationErrorCategory #820

  2. How was this tested:
    Simple integration test.

  3. Any docs updates needed?
    Maybe

@THardy98 THardy98 requested a review from a team as a code owner May 1, 2025 04:58
@THardy98 THardy98 force-pushed the add_application_category branch from 33b6b2a to 47d7e26 Compare May 1, 2025 04:59
@THardy98
Copy link
Contributor Author

THardy98 commented May 1, 2025

Was unsure - do we also want to filter logs on workflow task handling failures @cretz @dandavison :

except Exception as err:
if isinstance(err, _DeadlockError):
err.swap_traceback()
logger.exception(
"Failed handling activation on workflow with run ID %s", act.run_id
)
# Set completion failure
completion.failed.failure.SetInParent()
try:
self._data_converter.failure_converter.to_failure(
err,
self._data_converter.payload_converter,
completion.failed.failure,
)

Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor things. We should add followup issues on all SDKs once they are released to go update their "polling" samples to start using benign exceptions.

Was unsure - do we also want to filter logs on workflow task handling failures

This should not get hit on application error, so I don't think we need to add anything for it

Comment on lines 713 to 737
self.create_workflow_rule = client._new_call(
"create_workflow_rule",
wsv1.CreateWorkflowRuleRequest,
wsv1.CreateWorkflowRuleRequest,
)
self.delete_workflow_rule = client._new_call(
"delete_workflow_rule",
wsv1.DeleteWorkflowRuleRequest,
wsv1.DeleteWorkflowRuleResponse,
)
self.describe_workflow_rule = client._new_call(
"describe_workflow_rule",
wsv1.DescribeWorkflowRuleRequest,
wsv1.DescribeWorkflowRuleResponse,
)
self.list_workflow_rules = client._new_call(
"list_workflow_rules",
wsv1.ListWorkflowRulesRequest,
wsv1.ListWorkflowRulesResponse,
)
self.trigger_workflow_rule = client._new_call(
"trigger_workflow_rule",
wsv1.TriggerWorkflowRuleRequest,
wsv1.TriggerWorkflowRuleResponse,
)
Copy link
Member

@cretz cretz May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may need to also add these in client.rs (sorry it's such a manual process, we didn't expect RPCs to get added so regularly to justify auto-generating this, though we probably should)

We have a test_all_grpc_calls_present that is supposed to catch if these are missing, I wonder why it is not. Hrmm, I may need to investigate that...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added
from what I can tell, test_all_grpc_calls_present only compares the python client vs the GRPC service it should implement, there's not comparison with the rust bridge client

    assert_all_calls_present(
        client.workflow_service,
        temporalio.api.workflowservice.v1,
        temporalio.api.workflowservice.v1.WorkflowServiceStub,
   )

Copy link
Member

@cretz cretz May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, it's just testing the pure Python gRPC client stub creation matches what we've set in Python, it does not actually make the call like some SDKs do. My mistake. There's a gap there, but meh.

Comment on lines 336 to 350
"create_workflow_rule" => {
rpc_call!(retry_client, call, create_workflow_rule)
}
"delete_workflow_rule" => {
rpc_call!(retry_client, call, delete_workflow_rule)
}
"describe_workflow_rule" => {
rpc_call!(retry_client, call, describe_workflow_rule)
}
"list_workflow_rules" => {
rpc_call!(retry_client, call, list_workflow_rules)
}
"trigger_workflow_rule" => {
rpc_call!(retry_client, call, trigger_workflow_rule)
}
Copy link
Member

@cretz cretz May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gonna sound pedantic, sorry, but can we keep these in alphabetical order with the rest? Same for their placement in service.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, np

temporalio.api.enums.v1.ApplicationErrorCategory.APPLICATION_ERROR_CATEGORY_UNSPECIFIED
)

"""BENIGN category errors emit DEBUG level logs and do not record metrics"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a Python docstring has to be below the item

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, my mistake
a bit atypical docstring convention

temporalio.api.enums.v1.ApplicationErrorCategory.APPLICATION_ERROR_CATEGORY_BENIGN
)


class ApplicationError(FailureError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you adjust the default failure converter to properly populate category in both directions (to/from proto)? And can you add (or adjust) a test where, if you raise a benign application error from a workflow, the client can see the category in the application error (it'll be the cause of workflow failed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I had done this with go/java and forgot to do so here

@THardy98 THardy98 requested a review from cretz May 2, 2025 19:45
Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, feel free to merge when CI passes (there are a couple of flakes, you may have to "re-run failed jobs" a time or two)

@THardy98 THardy98 marked this pull request as draft May 5, 2025 16:48
@THardy98 THardy98 marked this pull request as ready for review May 5, 2025 17:51
@THardy98 THardy98 merged commit 8449a35 into main May 15, 2025
18 checks passed
@THardy98 THardy98 deleted the add_application_category branch May 15, 2025 23:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Apply application failure logging and metrics behaviour according to ApplicationErrorCategory
3 participants