Micro-batching and time spent from input to user handler #2173
Unanswered
andreea-anghel
asked this question in
General
Replies: 1 comment
-
@andreea-anghel If you look in bentoml._internal.frameworks, you can see the different frameworks we support. Within a framework, you can see that "load_runner" returns a subclass of "Runner" which implements either run or run_batch (we're changing this architecture soon so that it's only run fyi). If you log a timestamp when you are calling the "run" method in your api service, then log a timestamp when the run() method is called on the runner, the difference between the 2 should be the amount of time it takes to dispatch the message between the service and the model runner. Hope this helps! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have created an ML inference service using BentoML as the serving framework. I am using the adaptive BentoML micro-batching feature. I would like to measure the time spent by an HTTP request from the moment it arrives at the BentoML server until it is dispatched to the user-defined handler (being an inference service, this would be the predict function of a ML framework). Could anyone help with how to instrument the BentoML code to measure this time? Any input is appreciated - thank you.
Beta Was this translation helpful? Give feedback.
All reactions