Skip to content

fix(smolagents): Getting Multiple Traces While Streaming Code Agent #1872

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ialisaleh
Copy link
Collaborator

This PR fixes an issue where CodeAgent.run(stream=True) in smolagents would generate separate traces for each step instead of grouping them under a single parent trace. The problem stemmed from Python generators not preserving OpenTelemetry context across yield statements. To address this, a new wrapped_generator() was introduced that ensures the tracing context is reattached during streaming and the final output is collected for proper span annotation. This change preserves existing behavior for non-streaming runs and ensures consistent, complete tracing data regardless of the stream flag.

Closes #1322

@ialisaleh ialisaleh requested a review from a team as a code owner July 16, 2025 17:18
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jul 16, 2025
cursor[bot]

This comment was marked as outdated.

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Jul 21, 2025
cursor[bot]

This comment was marked as outdated.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Telemetry Span Status Overwritten Incorrectly

The finally blocks in both streaming and non-streaming execution paths unconditionally set the OpenTelemetry span status to OK. This overwrites any ERROR status previously set by an except block when an exception occurs, causing failed operations to be incorrectly reported as successful in telemetry.

python/instrumentation/openinference-instrumentation-smolagents/src/openinference/instrumentation/smolagents/_wrappers.py#L152-L222

def wrapped_generator() -> Generator[str, None, None]:
try:
# Collect chunks for final output
for chunk in agent_output:
output_chunks.append(str(chunk))
yield chunk
except Exception as e:
span.record_exception(e)
span.set_status(trace_api.StatusCode.ERROR)
raise
finally:
# Set output value from the last observation
steps = getattr(agent.monitor, "steps", [])
history = getattr(agent.monitor, "history", [])
if steps:
observation = getattr(steps[-1], "observations", None)
if observation:
span.set_attribute(OUTPUT_VALUE, observation)
elif history:
observation = getattr(history[-1], "observations", None)
if observation:
span.set_attribute(OUTPUT_VALUE, observation)
elif output_chunks:
span.set_attribute(OUTPUT_VALUE, "".join(output_chunks))
# Record token usage metadata
span.set_attribute(
LLM_TOKEN_COUNT_PROMPT, agent.monitor.total_input_token_count
)
span.set_attribute(
LLM_TOKEN_COUNT_COMPLETION, agent.monitor.total_output_token_count
)
span.set_attribute(
LLM_TOKEN_COUNT_TOTAL,
agent.monitor.total_input_token_count
+ agent.monitor.total_output_token_count,
)
span.set_status(trace_api.StatusCode.OK)
span.end()
context_api.detach(token)
return wrapped_generator()
# Handle non-streaming (normal) run
else:
try:
# Set output value from the agent output
span.set_attribute(OUTPUT_VALUE, str(agent_output))
# Record token usage metadata
span.set_attribute(LLM_TOKEN_COUNT_PROMPT, agent.monitor.total_input_token_count)
span.set_attribute(
LLM_TOKEN_COUNT_COMPLETION, agent.monitor.total_output_token_count
)
span.set_attribute(
LLM_TOKEN_COUNT_TOTAL,
agent.monitor.total_input_token_count + agent.monitor.total_output_token_count,
)
return agent_output
except Exception as e:
span.record_exception(e)
span.set_status(trace_api.StatusCode.ERROR)
raise
finally:
span.set_status(trace_api.StatusCode.OK)
span.end()
context_api.detach(token)

Fix in CursorFix in Web


Bug: Unhandled Exceptions Cause Resource Leaks

A resource leak occurs if the wrapped function raises an exception. The context_api token is attached and a span is started, but the initial exception handler records the error and re-raises without detaching the context token or ending the span, leading to leaked resources.

python/instrumentation/openinference-instrumentation-smolagents/src/openinference/instrumentation/smolagents/_wrappers.py#L139-L145

try:
agent_output = wrapped(*args, **kwargs)
except Exception as e:
span.record_exception(e)
span.set_status(trace_api.StatusCode.ERROR)
raise

Fix in CursorFix in Web


Bug: Error Status Overwritten in Finally Block

In the streaming generator's finally block, the span status is unconditionally set to OK. This overwrites any ERROR status previously set by the except block when an exception occurs during generator iteration, leading to a loss of error information on the span.

python/instrumentation/openinference-instrumentation-smolagents/src/openinference/instrumentation/smolagents/_wrappers.py#L191-L194

span.set_status(trace_api.StatusCode.OK)
span.end()
context_api.detach(token)

Fix in CursorFix in Web


Was this report helpful? Give feedback by reacting with 👍 or 👎

Comment on lines +489 to +498
# Handle streaming (generator) run
report_chunks = manager_code_agent.run("Fake question.", stream=True)
final_result = "".join([chunk for chunk in report_chunks])
assert final_result == "Final report."

# Handle streaming (generator) run
report_chunks = manager_toolcalling_agent.run("Fake question.", stream=True)
final_result = "".join([chunk for chunk in report_chunks])
assert final_result == "Final report."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test looks like it is being xfailed and doesn't seem to test the emitted traces at all. Can we nix altogether and add a more minimal test for the generator?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L This PR changes 100-499 lines, ignoring generated files.
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Instrumentation splits transactions into steps when streaming=True in codeAgent
2 participants