Support top level response caching for ensemble models #338

lkomali · 2024-04-04T05:32:04Z

ref slack thread: https://nvidia.slack.com/archives/CAZKCU4UV/p1677717244222069

Currently caching at the top-level request sent to ensemble scheduler is not supported.
Implemented caching top level requests for ensemble models.
In case of cache hit, the response is sent without executing composing models.
In case of cache miss, the ensemble pipeline runs as is.
Changed Logic for computing Cache Miss Latency: Cache Lookup time + Ensemble Pipeline time + Cache Insertion time
Moved similar logic form ensemble_scheduler.cc and dynamic_batch_scheduler.cc to scheduler_utils.cc to reduce code redundancy.

src/ensemble_scheduler/ensemble_scheduler/ensemble_scheduler.cc

src/ensemble_scheduler/ensemble_scheduler.cc

src/ensemble_scheduler/ensemble_scheduler.h

src/ensemble_scheduler/ensemble_scheduler.cc

src/scheduler_utils.h

src/scheduler_utils.cc

src/ensemble_scheduler/ensemble_scheduler.cc

rmccorm4

Nice work, Harshini! 🚀

rmccorm4 · 2024-05-06T18:49:22Z

FYI don't merge this until the testing PR is also ready to merge and pipelines look good

Tabrizian

Really clean code, thanks @lkomali !

src/ensemble_scheduler/ensemble_scheduler.cc

Tabrizian

🚀

lkomali added 15 commits April 2, 2024 18:12

Ensemble Top Level Request Caching Code

3b631a7

Ensemble Cache Changes

4983be5

Fixing errors

7b58899

DLIS-4626 - Enable Top Level Request Caching for Ensemble Models

6694f42

Modifications

aad163b

Debugging

397005a

Copyright Fix

e9b74e6

Fix Errors

7a8939d

Debugging

1766565

Debugging

34e2d8c

Fixing errors

f9179f5

Add headers

ce2c4e6

Merging conflicts

c9c1a0e

Removing duplicates

088c93e

Fix error

c5bebe9

lkomali requested a review from rmccorm4 April 4, 2024 05:32

lkomali added 2 commits April 3, 2024 22:39

Clang-format

0e24646

New line at the end

fd853e2

Tabrizian reviewed Apr 4, 2024

View reviewed changes

src/ensemble_scheduler/ensemble_scheduler/ensemble_scheduler.cc Outdated Show resolved Hide resolved

lkomali added 3 commits April 4, 2024 10:38

Removed duplicate files

d6b4484

Fixed typo

27976b6

Fixed typo

b3bd37c

rmccorm4 reviewed Apr 4, 2024

View reviewed changes

src/ensemble_scheduler/ensemble_scheduler.cc Outdated Show resolved Hide resolved

rmccorm4 reviewed Apr 4, 2024

View reviewed changes

src/ensemble_scheduler/ensemble_scheduler.h Outdated Show resolved Hide resolved

Removed duplicates and unnecessary log statements

44af492

This was referenced Apr 5, 2024

Tests for Top Level Request Caching for Ensemble Models triton-inference-server/server#7074

Merged

Changes to support Ensemble Top Level Response Caching triton-inference-server/client#560

Merged

lkomali changed the title ~~Support top level request caching for ensemble models~~ Support top level response caching for ensemble models Apr 6, 2024

rmccorm4 reviewed Apr 8, 2024

View reviewed changes

src/ensemble_scheduler/ensemble_scheduler.cc Outdated Show resolved Hide resolved

Log statement to check inflight count

55b3f23

lkomali added 4 commits April 23, 2024 14:01

Clang-format fix

ab6e49a

Debugging - Ignore these changes

71f4c5c

Debugging - Ignore

54fc871

Fix errors

e5fd609

lkomali marked this pull request as draft April 23, 2024 23:24

lkomali added 7 commits April 23, 2024 21:18

Debugging

3a5e2ce

Debugging

15e55ba

Debugging

85f48dc

Debugging

52ef0cd

Debugging - Initialization

f8efb27

Debugging

1491752

Fix cache miss duration calculation

273cb6e

lkomali marked this pull request as ready for review April 25, 2024 08:31

lkomali requested a review from rmccorm4 April 25, 2024 08:33

rmccorm4 requested a review from oandreeva-nv April 29, 2024 18:06

oandreeva-nv reviewed Apr 29, 2024

View reviewed changes

src/scheduler_utils.h Show resolved Hide resolved

oandreeva-nv reviewed Apr 29, 2024

View reviewed changes

src/scheduler_utils.cc Outdated Show resolved Hide resolved

oandreeva-nv reviewed Apr 29, 2024

View reviewed changes

src/ensemble_scheduler/ensemble_scheduler.cc Outdated Show resolved Hide resolved

lkomali added 3 commits April 29, 2024 13:18

Cache Lookup utility function description

34a0278

Fix description CacheLookUpUtil

568ebd7

Fix comments

75a4dc6

rmccorm4 approved these changes May 3, 2024

View reviewed changes

Tabrizian reviewed May 6, 2024

View reviewed changes

src/ensemble_scheduler/ensemble_scheduler.cc Show resolved Hide resolved

Tabrizian approved these changes May 7, 2024

View reviewed changes

Merge branch 'main' into lkomali-dlis-4626

815d9f0

rmccorm4 merged commit 47f3f4e into main May 9, 2024
1 check passed

lkomali added a commit that referenced this pull request May 9, 2024

Support top level response caching for ensemble models (#338)

05008c3

mc-nv added a commit that referenced this pull request May 10, 2024

Support top level response caching for ensemble models (#338) #352

f698fb8

mc-nv pushed a commit that referenced this pull request May 10, 2024

Support top level response caching for ensemble models (#338) (#352)

817aaf4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support top level response caching for ensemble models #338

Support top level response caching for ensemble models #338

Uh oh!

lkomali commented Apr 4, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rmccorm4 left a comment

Uh oh!

rmccorm4 commented May 6, 2024

Uh oh!

Tabrizian left a comment

Uh oh!

Uh oh!

Tabrizian left a comment

Uh oh!

Uh oh!

Uh oh!

Support top level response caching for ensemble models #338

Support top level response caching for ensemble models #338

Uh oh!

Conversation

lkomali commented Apr 4, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rmccorm4 left a comment

Choose a reason for hiding this comment

Uh oh!

rmccorm4 commented May 6, 2024

Uh oh!

Tabrizian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Tabrizian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!