Max width of ensemble fan out #7256
Unanswered
zachary-mcpher
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I am working on a project to serve an encoder based model in a Triton Inference Server ensemble. The nodes will be a preprocessing node which feeds directly into an encoder (generate an embedding feature from roberta-base) and then a fan out to K light weight classification head ( think N linear layers).
How far can I reasonably push K? Would the ensemble orchestrator be capable of handling inference at K=100 classifiers?
Beta Was this translation helpful? Give feedback.
All reactions