-
Notifications
You must be signed in to change notification settings - Fork 12.4k
Nomic Vulkan backend #4456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Nomic Vulkan backend #4456
Changes from 131 commits
Commits
Show all changes
155 commits
Select commit
Hold shift + click to select a range
ba15dfd
Nomic vulkan backend licensed under the Software for Open Models Lice…
niansa 48a45ea
Remove warning which fails on windows.
manyoso 8563fa0
remove dynamic deps from kompute build
apage43 45c8778
Switch to a dynamic dispatch table instead of linking hard against li…
manyoso b7e2e69
Completely revamp how we do object management with the vulkan backend…
manyoso beee572
Make kompute actually include external SDK headers when requested
apage43 68cf1df
Throw an exception when allocation fails for vulkan.
manyoso 8bea719
vulkan: disambiguate gpus with the same name
apage43 bd5f639
Don't try and install kompute artifacts.
manyoso 4ed25b2
Sync from device back to host at begin of new prompt.
manyoso 68aca6b
Only use vulkan with known quant that work.
manyoso addac25
Set the singleton to nullptr here.
manyoso 2c24d67
Don't crash on available devices if we can't even create an instance.
manyoso 1b1416d
Support for gguf.
manyoso 6b6c73a
kompute : don't fail build because of -Warray-bounds
cebtenzzre 9e4f8b4
Upload immediately to device.
manyoso 77135a3
Add a common boilerplate code via include and elim copy pasta
manyoso 93306f1
Consolidate code for mat x vec kernels and use subgroups more extensi…
manyoso 601905e
Move the subgroups and printf into common.
manyoso 5509f74
Minor cleanup.
manyoso 4b223ec
Refactor getrows to use common code and get ready for q6_k.
manyoso f1c9bc1
Add q6_k getrows and mul*vec kernel.
manyoso 06d4b21
Fix offset into the qh and now we have working vulkan accelerated for…
manyoso 32289aa
Fixes for norm.
manyoso 6ac3975
Fixup the upstream CMakelists.txt so we can build just llama.cpp with…
manyoso de589ce
Change this back to be in agreement with metal and our previous softm…
manyoso bc4b5ed
Fixes for subgroup size to bring AMD and NVIDIA inline with eachother…
manyoso 24a4a59
kompute : only try to use Vulkan for LLaMA itself
cebtenzzre 3d850db
kompute : remove Q6_K from list of supported quant types
cebtenzzre 9db90cb
f16 mv broadcasting fix (gqa fix)
apage43 ff4212d
q8 mat*vec
apage43 020b174
vulkan: implement neox mode for rope
apage43 8564f79
falcon h2d + reenable vulkan
apage43 09d83f0
Delete TODO now that we have q8_0.
manyoso f0cd38b
add mat*mat ops
apage43 46385ee
misc vulkan cleanup
apage43 3327d84
perf: use bigger threadgroups in mm
apage43 d5741c0
use op param epsilon for norms
apage43 b78a94b
q6k mm works
apage43 4809890
rm commented dbg print
apage43 cd0257e
q4_1 mat*mat
apage43 8dc79ac
clean up vulkan/cpu switch
apage43 9bc52eb
attempted speedups
apage43 c1fd645
attempted speedups 2
apage43 cc05a60
use mat*vec shaders for mat*mat
apage43 21841d3
kompute : enable kp_logger and make it static (#8)
cebtenzzre cbc0d1a
kompute : make scripts executable
cebtenzzre 8400015
Don't try an allocation on a heap that is smaller than the size we re…
manyoso 752f7eb
Remove unused push constant that was giving validation errors.
manyoso 8d9efbf
Lower the workgroup count for some shaders by providing a loop that p…
manyoso 74ddf0f
Fix synchronization problem for AMD Radeon with amdvlk driver or windows
manyoso 1c17010
vulkan : fix missing break in matmul selection (#9)
cebtenzzre 89b7127
llama : decide to disable Vulkan before loading tensors (#7)
cebtenzzre e006d37
Scale the workgroup count down to allow correct generation for falcon…
manyoso a5eb001
Revert the prompt processing on gpu for now.
manyoso ffd0624
Remove this debug code.
manyoso f88b198
llama : fix Vulkan whitelist (#11)
cebtenzzre a8cac53
kompute : fix issues with debug layers
cebtenzzre c438c16
fix build with external fmtlib (v10)
cebtenzzre af00cca
Merge commit 'ec893798b7a2a803466cc8f063051499ec3d96f7' into HEAD
cebtenzzre 71565eb
vulkan : replace ggml_diag_mask_inf with ggml_add (custom -inf mask)
cebtenzzre 84f7fc4
vulkan : rope n_past is now KQ_pos, f16 rope kernel
cebtenzzre 39abedd
vulkan : optimize workgroup sizes
cebtenzzre f194e1b
Merge commit 'fcca0a700487999d52a525c96d6661e9f6a8703a' into nomic-vu…
cebtenzzre a934b2c
vulkan : assert various kernel requirements
cebtenzzre 2a41ba7
Merge commit '469c9addef75893e6be12edda852d12e840bf064' into nomic-vu…
cebtenzzre 6474fc8
vulkan : handle ggml_scale for n%8 != 0
cebtenzzre fe26e6a
Merge commit 'e16b9fa4baa8a09c6619b116159830e898050942' into nomic-vu…
cebtenzzre 9c4dfd0
mention skipped change
cebtenzzre 02c3309
merge fixup (e16b9fa4baa8a09c6619b116159830e898050942)
cebtenzzre 1829f1d
Merge commit '4760e7cc0b68570d58f55e8dda469805d1759d0d~' into nomic-v…
cebtenzzre 208cd52
vulkan : implement YaRN RoPE scaling (#2268)
cebtenzzre 23f6d51
Merge commit '4760e7cc0b68570d58f55e8dda469805d1759d0d' into nomic-vu…
cebtenzzre a4bb9c5
vulkan : sync with "migrate to dynamic graphs"
cebtenzzre 9ae88ba
Merge remote-tracking branch 'upstream/master' into nomic-vulkan-redo
cebtenzzre 56430c3
relicense Vulkan backend as MIT
cebtenzzre 3e09e12
rename ggml-vulkan -> ggml-kompute
cebtenzzre 27631db
separate shaders from kompute itself
cebtenzzre 747e1ea
Merge commit '81bc9214a389362010f7a57f4cbc30e5f83a2d28' into nomic-vu…
cebtenzzre b906e12
kompute : fix compile warnings
cebtenzzre 9af7f58
move kompute to a submodule
cebtenzzre f7cb0a6
remove script with unclear purpose
cebtenzzre c8fd4ba
ggml : restore 'static' specifiers
cebtenzzre f58f581
refactor llama.cpp modifications
cebtenzzre 2d2c76a
vulkan : fix free of stack addr in llama_buffer
cebtenzzre 8072706
kompute : always destroy Manager via the destructor
cebtenzzre 44b1a97
kompute : fix -Wunused-private-field warnings from clang
cebtenzzre 8b65f4c
Merge commit 'bcc0eb4591bec5ec02fad3f2bdcb1b265052ea56' into ceb/nomi…
cebtenzzre 3959283
Merge commit '31f27758faf4a4bd08101a57c7ec3a473f771f86' into ceb/nomi…
cebtenzzre 904c563
sync xxd commands with GPT4All llama.cpp.cmake
cebtenzzre ae6d682
Merge commit 'd232aca5a73b290e218a2e48b91023d5e994203f' into ceb/nomi…
cebtenzzre 3773e1a
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
cebtenzzre 7c527eb
Merge commit 'e7e4df031b9e29d4b55a4e0b0295187f6b213db1' into HEAD
cebtenzzre 298d6ee
kompute : initial attempt at ggml-backend v2 support
cebtenzzre 5f660da
fix assertion failure
cebtenzzre 070919d
attempt to get test-backend-ops working
cebtenzzre cad72e1
add sanity check and fix kompute teardown order
cebtenzzre 76474a7
kompute : ignore exceptions in ggml_vk_available_devices (#12)
cebtenzzre d6bd471
kompute : fix rope_f32 and scale ops (#5008)
ggerganov 9431026
clean up old backend code
cebtenzzre e9d5223
actually fix this assertion
cebtenzzre 729e1a4
sync op_rope_f16 with recent op_rope_f32 changes
cebtenzzre 0753073
never try to evaluate an empty command buffer
cebtenzzre 2f6a279
fix supported ops for kompute backend
cebtenzzre 33e8d6a
kompute : fix ggml_add kernel (#5027)
ggerganov cb9ceff
minor cleanup
cebtenzzre 0899adf
kompute : fix get_rows dispatch -> 4 less failures
cebtenzzre 08e23fd
kompute : fix op_mul kernel -> 13 less test failures
cebtenzzre 2755ae3
kompute : fix more dispatch ambiguity -> 12 less failures
cebtenzzre 0787b80
kompute : remove broken mulrow kernel -> 1 less test failure
cebtenzzre 1a14099
fix q4_0/q4_1 mmv, 65 -> 49 failures
cebtenzzre 2b0f642
fix f16 mmv, 49 -> 41 failures
cebtenzzre 2852902
test-backend-ops : add llama test
cebtenzzre 1450966
test-backend-ops : test scale parameter of ggml_soft_max_ext
cebtenzzre 308f279
kompute : support scale parameter of softmax
cebtenzzre 8bd38fe
test-backend-ops : test mask parameter of ggml_soft_max_ext
cebtenzzre df687b1
kompute : support mask parameter of softmax
cebtenzzre ebb5f7e
test-backend-ops : test llama with different batch sizes
cebtenzzre ec68a96
test-backend-ops : increase max_nmse_err so Llama passes
cebtenzzre 987335e
kompute : fix algorithm names
cebtenzzre f5ac635
kompute : fix q8_0 mmv, 41 -> 28 failures
cebtenzzre 1849b85
test-backend-ops : add Falcon test
cebtenzzre 6fc99a6
test-backend-ops : test larger GELU range
cebtenzzre 38d1f0c
kompute : fix op_gelu -> Falcon is working on AMDVLK
cebtenzzre 11b3050
test-backend-ops : restore softmax tests
cebtenzzre de9fba0
kompute : fix basic f16 get_rows, 28 -> 26 failures
cebtenzzre 445a373
kompute : fix basic Q6_K get_rows, 26 -> 24 failures
cebtenzzre 3fbf052
kompute : mark last few failing ops as unsupported
cebtenzzre 3915194
test-backend-ops : make Falcon test faster with a smaller model
cebtenzzre bc28704
kompute : remove unused immintrin.h #include
cebtenzzre 91654ff
kompute : fix a -Wstrict-aliasing warning
cebtenzzre 61a5cf8
kompute : remove unnecessary use_mmap=false
cebtenzzre e6ce5f2
llama : revert unintended whitespace change
cebtenzzre aea8498
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
cebtenzzre 2512799
test-backend-ops : comment out Llama and Falcon tests
cebtenzzre 8ca33de
test-backend-ops : check all the ops in the test for support in the b…
slaren 6af02b1
kompute : init device automatically and remove an unnecessary free
cebtenzzre 2ff2d16
ggml-kompute.h : remove anything that doesn't need to be public
cebtenzzre cdab404
kompute : fix #includes
cebtenzzre 454baeb
op_mul_mat_mat_f32.comp : fix missing final newline
cebtenzzre 297fde5
editorconfig-checker : exclude .gitmodules
cebtenzzre 9132485
ci : initial attempt at testing Kompute backend
cebtenzzre 57cecad
main : remove ggml-kompute.h #include
cebtenzzre 4b0c96a
kompute : adapt ggml-kompute API to be compatible with C
cebtenzzre e6edd44
ci : attempt to fix Vulkan installer path
cebtenzzre 050d450
ci : do not run tests for Kompute (no GPU)
cebtenzzre 5304625
kompute : use llama_backend_init/llama_backend_free to manage device
cebtenzzre be7c055
kompute : better device management
cebtenzzre da1dc66
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
cebtenzzre dc08e51
kompute : fix merge issues
cebtenzzre 7e11fe0
kompute : remove llama_load_model_from_file_internal
cebtenzzre b932cd7
vulkan : correctly fix use-after-free in ggml_vk_current_device
cebtenzzre 48db724
minor fixup
cebtenzzre 1f98dff
fix trailing whitespace
cebtenzzre 2998211
fix incorrect memcpy
cebtenzzre File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
[submodule "kompute"] | ||
path = kompute | ||
url = https://github.com/nomic-ai/kompute.git |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.