CUDA: set_rows + cpy.cu refactor #14712

am17an · 2025-07-16T05:26:36Z

To also support the rest of the quantized data-types in set-rows

Moving cpy functions to a common headers to re-use in set-rows, I will do get-rows next if this refactor seems the right direction. I feel we can still make the interface a little cleaner

JohannesGaessler

Long-term we should rewrite the quantization code to make use of coalesced memory accesses. For an example, see quantize_q8_1 in quantize.cu, that kernel loads 32 contiguous floats and writes 32 contiguous bytes for the quantized values.

Updated for IKL IQ4_NL and Q6_0. Original Author : Aman Gupta

CUDA: set_rows + cpy.cu refactor

5afb942

am17an requested a review from JohannesGaessler July 16, 2025 05:26

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jul 16, 2025

ggerganov mentioned this pull request Jul 17, 2025

llama : add high-throughput mode #14363

Merged

23 tasks

JohannesGaessler approved these changes Jul 17, 2025

View reviewed changes

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Jul 17, 2025

CUDA: set_rows + cpy.cu refactor (ggml-org#14712)

5c75115

Updated for IKL IQ4_NL and Q6_0. Original Author : Aman Gupta

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA: set_rows + cpy.cu refactor #14712

CUDA: set_rows + cpy.cu refactor #14712

am17an commented Jul 16, 2025

Uh oh!

JohannesGaessler left a comment

Uh oh!

Uh oh!

CUDA: set_rows + cpy.cu refactor #14712

Are you sure you want to change the base?

CUDA: set_rows + cpy.cu refactor #14712

Conversation

am17an commented Jul 16, 2025

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!