feat: vulkan optimizations. #248

b4rtaz · 2025-08-14T22:35:20Z

1x RTX 3060 Ti 8 GB

🌋 Device: NVIDIA GeForce RTX 3060 Ti
🌋 DeviceApiVersion: 1.3.242
🌋 MaxComputeSharedMemory: 48 kB
🌋 NonCoherentAtomSize: 64 bytes
🌋 Heap[0]: 8192 MB
🌋 Heap[2]: 246 MB

Model	Tokens/s (version 0.15.0)	Tokens/s (version 0.15.2)	Tokens/s (This PR)
`lama3_1_8b_instruct_q40`	13.46	15.96	17.91

4x RTX 3090 24 GB

🌋 Device: NVIDIA GeForce RTX 3090
🌋 DeviceApiVersion: 1.4.303
🌋 MaxComputeSharedMemory: 48 kB
🌋 NonCoherentAtomSize: 64 bytes
🌋 Heap[0]: 24576 MB
🌋 Heap[2]: 246 MB

Model	Tokens/s (version 0.15.0)	Tokens/s (version 0.15.2)	Tokens/s (This PR)
`llama3_3_70b_instruct_q40`	-	4.22	4.35

b4rtaz added 4 commits August 14, 2025 23:05

feat: vulkan optimizations.

6537d3a

feat: optimized shaders.

2f1cdc2

feat: cast-forward-f32-f32.

1f9c5f9

feat: tweaks.

7cf3ee9

b4rtaz merged commit e35a2f3 into main Aug 16, 2025
3 checks passed

b4rtaz deleted the feat/vulkan-optimization-2 branch August 18, 2025 11:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: vulkan optimizations. #248

feat: vulkan optimizations. #248

Uh oh!

b4rtaz commented Aug 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

feat: vulkan optimizations. #248

feat: vulkan optimizations. #248

Uh oh!

Conversation

b4rtaz commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1x RTX 3060 Ti 8 GB

4x RTX 3090 24 GB

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

b4rtaz commented Aug 14, 2025 •

edited

Loading