Avoid token detaching from different context #1153

VegetarianOrc · 2025-10-11T02:32:33Z

What was changed

The OpenTelemetry and OpenAI Agents interceptors have been updated to avoid detaching context in such a way that does not cause a <Token> was created in a different Context error.

Why?

There are some situations where cleanup happens in a different asyncio.Task (when tasks get garbage collected for instance). When those situations happen the contextvars.Context used to create the contextvars.Token is no longer available. By confirming that the Otel context is the same as the one we attached, we can do a best effort detach and avoid the error.

Checklist

Issues Closed
1. Closes [Bug] GeneratorExit possibly causing issues on context detach in OTel finally #441
How was this tested:

A new test was added to force a GeneratorExit error after a task is garbage collected by disabling safe eviction. The test captures OpenTelemetry logs and ensures that the Failed to detach context message is not printed.

Any docs updates needed?

…was used to attach it

…helpers to test/helpers

… calls

cretz · 2025-10-17T14:57:52Z

temporalio/contrib/openai_agents/_trace_interceptor.py

+        ctx = contextvars.copy_context()
+        span = ctx.run(
+            self._create_span,
+            name="temporal:startActivity",
+            data={"activity": input.activity},
+            input=input,
+        )
+        handle = ctx.run(self.next.start_activity, input)


Hrmm, adding more stack trace depth and layers of indirection for the invocation of activities is a bit rough I think. Maybe I don't understand the problem well enough. So there was an issue with span.finish too, not just .detach? Without this change here (not discussing other changes later), can we cause a failure (even if not specifically the test in this PR)? What does the failure look like? I was under the impression it was OTel detach, not span.finish. Is OpenAI+OTel span finish somehow doing detach?

Also, assuming this can fail, I think we should be able to run the finish in the copied context but not have to run user code in it because the current context is good enough.

We can create a failure using things from #1136. OpenAIAgentsInstrumentor().instrument() but it would be a lot to try to include as a test. I'm actually leaning towards a different solution here where we simply close the startActivity span when the handle is created. This is similar to what we do with Otel, and I don't think parenting the activity execution to this really does much. (It will still be parented to the workflow.

The issue here is the same underlying issue with our Otel detach. In the OpenAIAgentsInstrumentor it manages it's own tokens and when/if the Temporal SDK executes span.finish in a different context their detach will raise the error.

Similarly, if we don't run the user code in the same context, I don't expect the span hierarchy to stay intact.

The detach thing made sense because it was in a finally, but this is done via add_done_callback. Are we saying that add_done_callback is invoked on GC/GeneratorExit like finally is? I see now there is a context parameter for add_done_callback and it defaults in all Python versions we support, so now I think that add_done_callback does run in the current context.

add_done_callback should execute the callable when the the Future is completed, which should include the GC/GeneratorExit scenario. I didn't see the context param before which we could definitely use here. Looks like if the context param is not provided, it captures existing context when calling add_done_callback and uses that.

cretz · 2025-10-17T15:02:01Z

temporalio/contrib/opentelemetry.py

+        with self._top_level_workflow_context(success_is_complete=True) as ctx:
+            if sys.version_info >= (3, 11):
+                return await asyncio.create_task(
+                    self._execute_workflow(input), context=ctx
+                )
+            else:
+                return await ctx.run(asyncio.create_task, self._execute_workflow(input))


Hrmm, I am not sure another layer of indirection and stack trace is good here, and am especially scared of a whole new asyncio task. Are there any other possible solutions that will allow us to not create tasks? Can we just alter the finally to ensure it is peformed in the context somehow? Maybe a less signficant change kind of back to what we had but specifically address just the one line that was the issue, detach?

(I admit I may not understand the technical details of some of this)

There is an option Alex explored where we specifically check if the context has changed and just not detach. We wanted to explore what it would look like to actually make the context consistent across the whole thing. @VegetarianOrc did we ever find a way to replicate with safe eviction on?

Are there any other possible solutions that will allow us to not create tasks?

As Tim mentioned, the other solution that I explored was to just detect if the finally is running in a different task and simply not detach in that case. The routes for this I tested out were comparing thread ids, comparing the current otel context, and comparing the tasks via asyncio.current_task().

did we ever find a way to replicate with safe eviction on?

I haven't been able to reproduce it with safe eviction on, but I do generally feel like it's a good idea to have the interceptor guard against this possibility.

Can we just alter the finally to ensure it is performed in the context somehow?

The issue is that user code must execute in the same context that the attach happens in otherwise the values inserted (tracing headers, etc) aren't available so spans aren't linked or parented correctly. In order to run any coroutine with a different context you have to create a new task that takes that context otherwise the coroutine will just copy the current context the moment it's awaited. So in this case ctx.run(self._execute_workflow, input) has the correct context when creating the coroutine object, but then just returns the coro object, which then inherits the current context when awaited.

If we want to avoid the task, I think the check to ensure that detach won't raise is a reasonable alternative.

As Tim mentioned, the other solution that I explored was to just detect if the finally is running in a different task and simply not detach in that case. The routes for this I tested out were comparing thread ids, comparing the current otel context, and comparing the tasks via asyncio.current_task().

If you can replicate in tests, swallowing the detach failure I think is acceptable too. Regardless, I don't think detach must run in a case where we're already GC'ing here. So really, "detach" just needs to be best effort, not affect the rest of the code.

The issue is that user code must execute in the same context that the attach happens in otherwise the values inserted (tracing headers, etc) aren't available so spans aren't linked or parented correctly

Which it is already the case today (it affects the current context), this is only an issue with detach

Also, I bet you can just ctx.run the detach without ctx.run/create_task'ing everything. I think we should limit the changes to just address the detach part.

Also, I bet you can just ctx.run the detach without ctx.run/create_task'ing everything.

I couldn't find a way to store a reference to the current contextvars.Context to use just on the detach. Without that, the alternative is to copy the current context and ensure that attach, execute, and detach all run in that copied context.

So really, "detach" just needs to be best effort, not affect the rest of the code.
...
I think we should limit the changes to just address the detach part.

Fully on board with this approach as well. It makes sense to me that in the case that the context we ran in is no longer valid, we just don't detach as it's already gone. Just thought it was worth exploring what it looked like to ensure everything was executed in a context where we controlled the lifetime. @tconley1428, does the best effort detach approach make sense to you too? It's certainly much simpler.

I couldn't find a way to store a reference to the current contextvars.Context to use just on the detach. Without that, the alternative is to copy the current context and ensure that attach, execute, and detach all run in that copied context.

copy_context sounds more expensive than it is IIUC. copy_context is more copy-on-write-as-of-when-called, so effectively it's like a super cheap context snapshot/clone IIUC.

But yeah, swallowing best-effort detach failure I think is best, because detach is just about detaching from context (which is going away anyways because this task/generator is being GC'd), it isn't something like a "flush" IIUC where the absence of such a call could mean data loss.

I think it's fine in this scenario.

…ot just equal but is the same object

cretz · 2025-10-17T21:22:41Z

temporalio/contrib/opentelemetry.py

+            # different contextvars.Context than the one the token was created
+            # on. As such we do a best effort detach to avoid using a mismatched
+            # token.
+            if context is opentelemetry.context.get_current():


cretz · 2025-10-17T21:23:59Z

tests/worker/test_workflow.py

    unpause_and_assert,
    workflow_update_exists,
 )
+from tests.helpers.cache_evitction import (


Misspelling in the filename

VegetarianOrc added 13 commits October 10, 2025 19:30

ensure that the context used to detach the token is the same as what …

fc2a2e6

…was used to attach it

Update test to use LogCapturer helper. Move shared test workflow and …

7a51529

…helpers to test/helpers

run formatter

1b8d9e1

Fix up test with log capturer. Only log warnings if there are no hook…

b656e84

… calls

run formatter

5abc5a4

Move to a class based context manager to manually manage context

d1bf879

apply context management to openai_agents tracing

6ca781c

use 3.11 create_task when possible to avoid an extra context copy

6060b35

run formatter

13e99a3

fix typing lint errors

853f603

Fix a few more typing errors

7ae97d0

Use a lambda to wrap task creation to appease the type linter

aaccb00

Merge branch 'main' into otel/context-detach

6deb5b1

VegetarianOrc marked this pull request as ready for review October 16, 2025 22:46

VegetarianOrc requested a review from a team as a code owner October 16, 2025 22:46

cretz reviewed Oct 17, 2025

View reviewed changes

VegetarianOrc added 3 commits October 17, 2025 13:41

revert manual context management explorations

fe8b0a5

Add comment explaining the check. Use to ensure that the context is n…

2274ba4

…ot just equal but is the same object

use original variable name

460ed71

cretz approved these changes Oct 17, 2025

View reviewed changes

Fix typo

83ae1fb

Avoid token detaching from different context #1153

Are you sure you want to change the base?

Avoid token detaching from different context #1153

Uh oh!

Conversation

VegetarianOrc commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What was changed

Why?

Checklist

Uh oh!

cretz Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cretz Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cretz Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

VegetarianOrc commented Oct 11, 2025 •

edited

Loading

cretz Oct 17, 2025 •

edited

Loading

cretz Oct 17, 2025 •

edited

Loading

cretz Oct 17, 2025 •

edited

Loading