How does the Python Backend create multiple instances with `instance_kind: count` setting? #8256

IMG-PRCSNG · 2025-06-17T18:36:40Z

IMG-PRCSNG
Jun 17, 2025

Does the python backend load the model and fork to many instances? Can we configure it to use the equivalent of multiprocessing spawn mode?

I am using a custom backend to run some old caffe models but finding the throughput remaining the same when I increase the instance count, when testing with perf_analyzer. CPU and GPU are not at 100% utilisation.

I think the caffe is being imported in the parent process and somehow is shared with the child, thus only allowing sequential access.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How does the Python Backend create multiple instances with `instance_kind: count` setting? #8256

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

How does the Python Backend create multiple instances with instance_kind: count setting? #8256

Uh oh!

IMG-PRCSNG Jun 17, 2025

Replies: 0 comments

How does the Python Backend create multiple instances with `instance_kind: count` setting? #8256

IMG-PRCSNG
Jun 17, 2025