Skip to content

feat: LPC CUDA kernel #24

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 29, 2025
Merged

feat: LPC CUDA kernel #24

merged 4 commits into from
May 29, 2025

Conversation

yoyolicoris
Copy link
Member

@yoyolicoris yoyolicoris commented May 29, 2025

import torch
import torchlpc
from torchlpc.core import lpc_cuda
from numba import cuda

from timeit import timeit
from time import sleep

batch_size = 64
samples = 2**14
order = 2

lpc_A = torch.zeros(batch_size, samples, order).cuda() + 0j
lpc_zi = torch.randn(batch_size, order).cuda() + 0j
lpc_x = torch.randn(batch_size, samples).cuda() + 0j

lpc_cuda(lpc_x, lpc_A, lpc_zi)
t_numba_lpc = timeit(
    "lpc_cuda(lpc_x, lpc_A, lpc_zi)",
    globals=globals(),
    number=100,
)
print(f"Numba LPC time: {t_numba_lpc:.4f} seconds")

sleep(1)  # Ensure the GPU is ready for the next operation

t_torch_lpc = timeit(
    "torch.ops.torchlpc.lpc(lpc_x, lpc_A, lpc_zi)",
    globals=globals(),
    number=100,
)
print(f"Torch LPC time: {t_torch_lpc:.4f} seconds")
print(f"Torch LPC is {t_numba_lpc / t_torch_lpc:.2f}x faster than Numba")

The results on a 5060 ti GPU with linux machine:

Numba LPC time: 0.8524 seconds
Torch LPC time: 0.0152 seconds
Torch LPC is 56.07x faster than Numba

Don't know why, but the .cu version is significantly faster.

@yoyolicoris yoyolicoris mentioned this pull request May 5, 2025
9 tasks
@yoyolicoris yoyolicoris requested a review from Copilot May 29, 2025 10:11
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new LPC CUDA kernel implementation and updates related recurrence functions to leverage both CUDA and CPU runners. Key changes include refactoring the recurrence functions to use lambdas for kernel dispatching, adding CUDA kernel implementations in C++ under torchlpc/csrc/cuda/lpc.cu, and extending test coverage to include additional sample sizes and device options.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
torchlpc/recurrence.py Updated recurrence functions to choose between CUDA and CPU runners via lambdas.
torchlpc/csrc/cuda/lpc.cu Added new CUDA kernels for LPC computation including support for complex types.
torchlpc/core.py Adjusted LPC forward logic to conditionally dispatch based on EXTENSION_LOADED.
tests/test_extension.py Expanded test parameters for sample sizes and devices (CPU/CUDA) for LPC equivalence.

@yoyolicoris yoyolicoris merged commit 37e8115 into main May 29, 2025
6 of 8 checks passed
@yoyolicoris yoyolicoris deleted the feat/native-lpc-cuda branch May 29, 2025 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant