Slangpy CPU overhead when using pytorch tensors

When we use pytorch tensors with slangpy, the CPU overhead seems quite high. With profiling tools, it was narrowed down to `slangpy_native_call` being the top runner there. Below is a simple benchmark that can be used to reproduce this issue.

With a local test run, I see `average submit time 0.867732ms`. If the number of parameters are increased to say 5, this average becomes: `average submit time 1.277054ms`.

With slangpy using torch tensors, there is quite an overhead expected, but the goal of this bug is to figure out how much of that can be reduced. 

```
import pytest
import numpy as np
from time import time

import slangpy as spy
from slangpy.testing import helpers
from slangpy.testing.benchmark import BenchmarkSlangFunction

ADD_FLOATS = """
float add_floats(float a, float b) {
    return a + b;
}
"""

def benchmark_torch(
    device_type: spy.DeviceType
):

    device = helpers.get_torch_device(device_type)
    func = helpers.create_function_from_module(device, "add_floats", ADD_FLOATS)

    BUFFER_SIZE = 64*1024*1024
    import torch

    a = torch.randn((BUFFER_SIZE,), dtype=torch.float32, device="cuda")
    b = torch.randn((BUFFER_SIZE,), dtype=torch.float32, device="cuda")
    res = torch.randn((BUFFER_SIZE,), dtype=torch.float32, device="cuda")

    # Warmup + wait for cuda
    func(a, b, _result=res)
    device.wait_for_idle()

    input("Press Enter to start torch benchmark...")

    test_start = time()
    for _ in range(1000):
        func(a, b, _result=res)
    submit_end = time()
    device.wait_for_idle()
    test_end = time()

    end = time()
    print(f"Torch benchmark completed in {test_end - test_start:.3f} seconds, average submit time {1000*(submit_end - test_start)/1000:.6f}ms")


if __name__ == "__main__":
    #pytest.main([__file__, "-v", "-s"])
    benchmark_torch(spy.DeviceType.cuda)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Slangpy CPU overhead when using pytorch tensors #556

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Slangpy CPU overhead when using pytorch tensors #556

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions