Skip to content

Slangpy CPU overhead when using pytorch tensors #556

@mkeshavaNV

Description

@mkeshavaNV

When we use pytorch tensors with slangpy, the CPU overhead seems quite high. With profiling tools, it was narrowed down to slangpy_native_call being the top runner there. Below is a simple benchmark that can be used to reproduce this issue.

With a local test run, I see average submit time 0.867732ms. If the number of parameters are increased to say 5, this average becomes: average submit time 1.277054ms.

With slangpy using torch tensors, there is quite an overhead expected, but the goal of this bug is to figure out how much of that can be reduced.

import pytest
import numpy as np
from time import time

import slangpy as spy
from slangpy.testing import helpers
from slangpy.testing.benchmark import BenchmarkSlangFunction

ADD_FLOATS = """
float add_floats(float a, float b) {
    return a + b;
}
"""

def benchmark_torch(
    device_type: spy.DeviceType
):

    device = helpers.get_torch_device(device_type)
    func = helpers.create_function_from_module(device, "add_floats", ADD_FLOATS)

    BUFFER_SIZE = 64*1024*1024
    import torch

    a = torch.randn((BUFFER_SIZE,), dtype=torch.float32, device="cuda")
    b = torch.randn((BUFFER_SIZE,), dtype=torch.float32, device="cuda")
    res = torch.randn((BUFFER_SIZE,), dtype=torch.float32, device="cuda")

    # Warmup + wait for cuda
    func(a, b, _result=res)
    device.wait_for_idle()

    input("Press Enter to start torch benchmark...")

    test_start = time()
    for _ in range(1000):
        func(a, b, _result=res)
    submit_end = time()
    device.wait_for_idle()
    test_end = time()

    end = time()
    print(f"Torch benchmark completed in {test_end - test_start:.3f} seconds, average submit time {1000*(submit_end - test_start)/1000:.6f}ms")


if __name__ == "__main__":
    #pytest.main([__file__, "-v", "-s"])
    benchmark_torch(spy.DeviceType.cuda)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions