Dynamically Allocated Engines #3714

narendasan · 2025-07-24T18:57:27Z

narendasan
Jul 24, 2025
Collaborator

Dynamically Allocated Engines

TL;DR

Some engines take >> than the amount of GPU memory of their original PyTorch modules. This means that workflows that worked in PyTorch will fail in TensorRT even if the engines are compiled.

If users are able to dynamically provide the memory to run an engine and then reuse it later then this would address this issue.

Goal(s)

Do not capture GPU memory until execution, as part of the execute engine process, the runtime will allocate the memory for the engine and then release it once the execution is completed.

Usecases

Running a diffusers pipeline where there are multiple sub modules, if the TRT engine is larger than the original pytorch module these workflows will fail unless the memory is dynamically released

Proposed APIs / UX

Similar to other runtime settings there will be two sets of APIs.

TRTEngine::ResourceAllocationStrategy TRTEngine::resource_allocation_strategy = TRTEngine::ResourceAllocationStrategy::kFixed | TRTEngine::ResourceAllocationStrategy::kDynamic 

TRTEngine::set_resource_allocation_strategy(TRTEngine::ResourceAllocationStrategy new_strategy) {}

TRTEngine::get_resource_allocation_strategy() {}

TRTEngine::TRTEngine(..., ResourceAllocationStrategy resource_allocation_strategy) {}



+ Associated torchbind lifting and exposure through TorchTensorRTModule

A context manager

with torch_tensorrt.runtime.resource_allocation_strategy("dynamic") 
    # Iterates through sub-modules and flips the settings to re initialize the engines.

Example Workflow

with torch_tensorrt.runtime.resource_allocation_strategy("dynamic")

Limitations

Does not address memory utilization of an engine, only handles multi module pipelines to allow a TRT engine to evacuate GPU memory before the next engine or PyTorch module runs.

Internal Implementation

Design

TRTEngine::set_resource_allocation_strategy(TRTEngine::ResourceAllocationStrategy new_strategy) {
    # If the allocation strategy is changed, destroy current execution context and recreate
    if (new_strategy != this->resource_allocation_strategy) {
        this->resource_allocation_strategy = new_strategy
        if (this->resource_allocation_strategy == ResourceAllocationStrategy::kDynamic) {
            this->exec_ctx = this->engine.create_execution_context_without_device_memory();
        } else {
            this->exec_ctx = this->engine.create_execution_context();
        }
    }
}

TRTEngine::get_resource_allocation_strategy() {...}

TRTEngine::TRTEngine(..., ResourceAllocationStrategy resource_allocation_strategy) {  
    ...
    if (this->resource_allocation_strategy == ResourceAllocationStrategy::kDynamic) {
        this->exec_ctx = this->engine.create_execution_context_without_device_memory();
    } else {
        this->exec_ctx = this->engine.create_execution_context();
    }
    ...
}

execute_engine(...) {
     ...
     torch::Tensor dynamic_workspace; 
     if (engine.resource_allocation_strategy == ResourceAllocationStrategy::kDynamic) {
         dynamic_workspace = torch.empty(engine.device_memory_size, dtype=torch.uint8, device='cuda')
         exec_ctx.device_memory = dynamic_workspace.data_ptr;
     }
     ...
}

+ Associated torchbind lifting and exposure through TorchTensorRTModule

Extensions Required to Core API implementations

We need an additional runtime mode added to both the C++ and Python runtime

Data Structures

New enum to describe the runtime mode.

TRTEngine::ResourceAllocationStrategy TRTEngine::resource_allocation_strategy = TRTEngine::ResourceAllocationStrategy::kFixed | TRTEngine::ResourceAllocationStrategy::kDynamic

Implementation Phases

Prototype - S

Support in C++ runtime

MVP `(2.9.0)` S

All of the above supported in C++ and Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dynamically Allocated Engines #3714

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Dynamically Allocated Engines #3714

Uh oh!

narendasan Jul 24, 2025 Collaborator

Dynamically Allocated Engines

TL;DR

Goal(s)

Usecases

Proposed APIs / UX

Example Workflow

Limitations

Internal Implementation

Design

Extensions Required to Core API implementations

Data Structures

Implementation Phases

Prototype - S

MVP (2.9.0) S

Replies: 0 comments

narendasan
Jul 24, 2025
Collaborator

MVP `(2.9.0)` S