feat: LPC CUDA kernel #24

yoyolicoris · 2025-05-29T09:52:20Z

import torch
import torchlpc
from torchlpc.core import lpc_cuda
from numba import cuda

from timeit import timeit
from time import sleep

batch_size = 64
samples = 2**14
order = 2

lpc_A = torch.zeros(batch_size, samples, order).cuda() + 0j
lpc_zi = torch.randn(batch_size, order).cuda() + 0j
lpc_x = torch.randn(batch_size, samples).cuda() + 0j

lpc_cuda(lpc_x, lpc_A, lpc_zi)
t_numba_lpc = timeit(
    "lpc_cuda(lpc_x, lpc_A, lpc_zi)",
    globals=globals(),
    number=100,
)
print(f"Numba LPC time: {t_numba_lpc:.4f} seconds")

sleep(1)  # Ensure the GPU is ready for the next operation

t_torch_lpc = timeit(
    "torch.ops.torchlpc.lpc(lpc_x, lpc_A, lpc_zi)",
    globals=globals(),
    number=100,
)
print(f"Torch LPC time: {t_torch_lpc:.4f} seconds")
print(f"Torch LPC is {t_numba_lpc / t_torch_lpc:.2f}x faster than Numba")

The results on a 5060 ti GPU with linux machine:

Numba LPC time: 0.8524 seconds
Torch LPC time: 0.0152 seconds
Torch LPC is 56.07x faster than Numba

Don't know why, but the .cu version is significantly faster.

Copilot

Pull Request Overview

This PR introduces a new LPC CUDA kernel implementation and updates related recurrence functions to leverage both CUDA and CPU runners. Key changes include refactoring the recurrence functions to use lambdas for kernel dispatching, adding CUDA kernel implementations in C++ under torchlpc/csrc/cuda/lpc.cu, and extending test coverage to include additional sample sizes and device options.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
torchlpc/recurrence.py	Updated recurrence functions to choose between CUDA and CPU runners via lambdas.
torchlpc/csrc/cuda/lpc.cu	Added new CUDA kernels for LPC computation including support for complex types.
torchlpc/core.py	Adjusted LPC forward logic to conditionally dispatch based on EXTENSION_LOADED.
tests/test_extension.py	Expanded test parameters for sample sizes and devices (CPU/CUDA) for LPC equivalence.

torchlpc/recurrence.py

torchlpc/core.py

yoyolicoris added 4 commits May 29, 2025 10:20

feat: CUDA kernels for LPC and complex LPC computation

db7993e

refactor: backend selection logic

e67cea2

refactor: streamline CUDA and CPU runner assignments in recurrence.py

eea9cbf

test: update lpc equivalence test for cuda device

65a5a83

yoyolicoris mentioned this pull request May 5, 2025

Future features #6

Open

9 tasks

yoyolicoris requested a review from Copilot May 29, 2025 10:11

Copilot AI reviewed May 29, 2025

View reviewed changes

torchlpc/recurrence.py Show resolved Hide resolved

torchlpc/core.py Show resolved Hide resolved

yoyolicoris merged commit 37e8115 into main May 29, 2025
6 of 8 checks passed

yoyolicoris deleted the feat/native-lpc-cuda branch May 29, 2025 13:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: LPC CUDA kernel #24

feat: LPC CUDA kernel #24

Uh oh!

yoyolicoris commented May 29, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat: LPC CUDA kernel #24

feat: LPC CUDA kernel #24

Uh oh!

Conversation

yoyolicoris commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yoyolicoris commented May 29, 2025 •

edited

Loading