batch option doesn't optimize latency , it just trade off latency to throughput isn't it? #3307
Unanswered
KimSoungRyoul
asked this question in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
when bentoml server (--production) under on same load and same resource, Server that didn't use batch option was always faster.
I want to know that scenario can exist in which latency can be improved when batch option is enabled?
(like a more resource to runner process(
bentoml start-runner-server
) than api-server process(bentoml start-http-server
))if not,
batch option doesn't optimize latency , it just trade off latency to throughput isn't it?
batch: False
Latency : (330 ~480(ms) 95%) throughpu avg: 141
batch: True
max_batch_size: 100
Latency: (3200 ~ 3700(ms) 95%) but throghput is improved (140~174)
Beta Was this translation helpful? Give feedback.
All reactions