Skip to content

feat: Ensemble async callback execution #429

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 4, 2025

Conversation

yinggeh
Copy link
Contributor

@yinggeh yinggeh commented Mar 12, 2025

What does the PR do?

Reduce e2e latency in ensemble model by executing callbacks asynchronously at the end of each ensemble step.

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • feat

Related PRs:

triton-inference-server/common#133

Where should the reviewer start?

Test plan:

L0_simple_ensemble

  • CI Pipeline ID:
    25280555

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #7650

@yinggeh yinggeh added the PR: feat A new feature label Mar 12, 2025
@yinggeh yinggeh requested a review from GuanLuo March 12, 2025 14:39
@yinggeh yinggeh self-assigned this Mar 12, 2025
@yinggeh yinggeh force-pushed the yinggeh-DLIS-8163-ensemble-async-callback branch from 60cd9ba to 63135f6 Compare March 12, 2025 14:48
@yinggeh yinggeh requested a review from GuanLuo March 14, 2025 17:09
@yinggeh yinggeh requested a review from ziqif-nv March 31, 2025 21:17
src/server.cc Outdated
#ifdef TRITON_ENABLE_ENSEMBLE
// TODO: Need to scale the thread pool size smarter, e.g. based on the
// instance_group count of composing models.
ensemble_cb_pool_.reset(new triton::common::ThreadPool(16u));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason to use 16u as the magic number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was based on experiments. Thread pool with size 8 generally yields highest throughput. I think we can use 8 for now and increase if necessary.
Screenshot 2025-04-03 at 12 52 10 PM

@yinggeh yinggeh requested a review from ziqif-nv April 3, 2025 20:07
@yinggeh yinggeh merged commit 56e97eb into main Apr 4, 2025
2 checks passed
yinggeh added a commit that referenced this pull request Apr 30, 2025
dmitry-tokarev-nv added a commit that referenced this pull request Apr 30, 2025
dmitry-tokarev-nv added a commit that referenced this pull request May 1, 2025
dmitry-tokarev-nv added a commit that referenced this pull request May 1, 2025
yinggeh added a commit that referenced this pull request May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: feat A new feature
Development

Successfully merging this pull request may close these issues.

3 participants