Skip to content

Make child workflows create OpenTelemetry child spans #317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

dbmikus
Copy link

@dbmikus dbmikus commented Apr 25, 2025

Make child workflows create OpenTelemetry child spans so that you can track execution across sub-workflows.

Testing:

  • tested running against a local workflow that creates child works

Issues to fix:

  • Queue("...").enqueue_async workflows do not create child spans

dbmikus added 2 commits April 25, 2025 15:12
When asserting that the creation of a new workflow occurs outside a
step, add a human-readable error message to the assertion.
@dbmikus
Copy link
Author

dbmikus commented Apr 25, 2025

This is not a complete PR. When I create child workflows via like so:

myqueue = Queue("myqueue", concurrency=25)

@DBOS.workflow()
async def wf():
    await myqueue.enqueue_async(sub_wf)

@DBOS.workflow()
async def sub_wf():
    pass

the sub_wf workflows show up as new traces.

@qianl15
Copy link
Member

qianl15 commented Apr 26, 2025

Thanks for the question! Right now, enqueue doesn’t create child spans because the enqueued tasks are often executed asynchronously on a different executor or machine — sometimes even hours later, long after the parent workflow has finished. In these cases, we won't be able to directly pass in a parent span in memory and creating child spans could lead to confusing or misleading traces.

That said, we agree it would be useful to track execution across parent and child workflows in some scenarios. Our team has been discussing potential solutions, including persisting and populating the span ID through Postgres. Definitely something we're actively thinking about.

@dbmikus
Copy link
Author

dbmikus commented Apr 26, 2025

It would be useful to use standard OTel tooling for tracking workflows in DBOS, if possible.

I understand that there are pain points with OTel and very long-running traces. I've previously put traces on Kafka and SQS, but those were consumed relatively quickly. TBH, I'm not sure of the ramifications of having a trace that can exist for days. There might be no problems, or it might break OTel collection. There are ways to link two traces together, which might alleviate long-lived trace problems.

Another simpler solution is to make child workflows exist within the same trace as long as they are not executed on a different executor.

Do steps / child-workflows only execute on a different executor when using queues?

I can close this PR if it's not worth keeping around for the discussion.

@dbmikus
Copy link
Author

dbmikus commented Apr 26, 2025

For context, we use OpenTelemetry for observability and sometimes data collection of our LLMs. We record some function inputs/outputs on OTel spans and also record log messages in the spans. Being able to debug the LLM flow across spans is very helpful, and other LLM ops products support ingesting OTel traces.

@qianl15
Copy link
Member

qianl15 commented Apr 28, 2025

Hi @dbmikus I copied the discussions here and created an issue thread to keep track of the development: #322

For now, we've made DBOS.tracer available in the public interface (#306), so you may add your own tracing spans in your functions.

@qianl15 qianl15 closed this Apr 28, 2025
@qianl15 qianl15 linked an issue Apr 28, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make child workflows create OpenTelemetry child spans
2 participants