Skip to content

Commit 8a8218c

Browse files
committed
Squashed commit of the following:
commit 8432e9d Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 9 16:55:30 2023 -0500 Update Makefile commit b58c189 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 9 16:20:00 2023 -0500 Add multi-gpu CuBLAS support to new GUI commit 0c1c71b Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 8 07:56:57 2023 -0500 Update Makefile commit f864f60 Author: Johannes Gäßler <johannesg@5d6.de> Date: Sat Jul 8 00:25:15 2023 +0200 CUDA: add __restrict__ to mul mat vec kernels (ggml-org#2140) commit 4539bc2 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 8 01:36:14 2023 -0500 update makefile for changes commit 912e31e Merge: 74e2703 ddaa4f2 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Fri Jul 7 23:15:37 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit ddaa4f2 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Fri Jul 7 22:14:14 2023 +0800 fix cuda garbage results and gpu selection issues commit 95eca51 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Fri Jul 7 18:39:47 2023 +0800 add gpu choice for GUI for cuda commit a689a66 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Fri Jul 7 17:52:34 2023 +0800 make it work with pyinstaller commit 9ee9a77 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Fri Jul 7 16:25:37 2023 +0800 warn outdated GUI (+1 squashed commits) Squashed commits: [15aec3d] spelling error commit 32102c2 Merge: 8424a35 481f793 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Fri Jul 7 14:15:39 2023 +0800 Merge branch 'master' into concedo_experimental # Conflicts: # README.md commit 481f793 Author: Howard Su <howard0su@gmail.com> Date: Fri Jul 7 11:34:18 2023 +0800 Fix opencl by wrap #if-else-endif with \n (ggml-org#2086) commit dfd9fce Author: Georgi Gerganov <ggerganov@gmail.com> Date: Thu Jul 6 19:41:31 2023 +0300 ggml : fix restrict usage commit 36680f6 Author: Judd <foldl@users.noreply.github.com> Date: Fri Jul 7 00:23:49 2023 +0800 convert : update for baichuan (ggml-org#2081) 1. guess n_layers; 2. relax warnings on context size; 3. add a note that its derivations are also supported. Co-authored-by: Judd <foldl@boxvest.com> commit a17a268 Author: tslmy <tslmy@users.noreply.github.com> Date: Thu Jul 6 09:17:50 2023 -0700 alpaca.sh : update model file name (ggml-org#2074) The original file name, `ggml-alpaca-7b-q4.bin`, implied the first-generation GGML. After the breaking changes (mentioned in ggml-org#382), `llama.cpp` requires GGML V3 now. Those model files are named `*ggmlv3*.bin`. We should change the example to an actually working model file, so that this thing is more likely to run out-of-the-box for more people, and less people would waste time downloading the old Alpaca model. commit 8424a35 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Thu Jul 6 23:24:21 2023 +0800 added the ability to ban any substring tokens commit 27a0907 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Thu Jul 6 22:33:46 2023 +0800 backport MM256_SET_M128I to ggml_v2, updated lite, added support for selecting the GPU for cublas commit 220aa70 Merge: 4d1700b 31cfbb1 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Thu Jul 6 15:40:40 2023 +0800 Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # Makefile # README.md # pocs/vdot/q8dot.cpp # pocs/vdot/vdot.cpp # scripts/sync-ggml.sh # tests/test-grad0.c # tests/test-quantize-fns.cpp # tests/test-quantize-perf.cpp commit 4d1700b Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Thu Jul 6 15:17:47 2023 +0800 adjust some ui sizing commit 1c80002 Author: Vali-98 <137794480+Vali-98@users.noreply.github.com> Date: Thu Jul 6 15:00:57 2023 +0800 New UI using customtkinter (LostRuins#284) * Initial conversion to customtkinter. * Initial conversion to customtkinter. * Additions to UI, still non-functional * UI now functional, untested * UI now functional, untested * Added saving configs * Saving and loading now functional * Fixed sliders not loading * Cleaned up duplicate arrays * Cleaned up duplicate arrays * Fixed loading bugs * wip fixing all the broken parameters. PLEASE test before you commit * further cleaning * bugfix completed for gui. now evaluating save and load * cleanup prepare to merge --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com> commit 31cfbb1 Author: Tobias Lütke <tobi@shopify.com> Date: Wed Jul 5 16:51:13 2023 -0400 Expose generation timings from server & update completions.js (ggml-org#2116) * use javascript generators as much cleaner API Also add ways to access completion as promise and EventSource * export llama_timings as struct and expose them in server * update readme, update baked includes * llama : uniform variable names + struct init --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> commit 74e2703 Merge: cf65429 f9108ba Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jul 5 15:16:49 2023 -0500 Merge branch 'LostRuins:concedo' into main commit 983b555 Author: Jesse Jojo Johnson <williamsaintgeorge@gmail.com> Date: Wed Jul 5 18:03:19 2023 +0000 Update Server Instructions (ggml-org#2113) * Update server instructions for web front end * Update server README * Remove duplicate OAI instructions * Fix duplicate text --------- Co-authored-by: Jesse Johnson <thatguy@jessejojojohnson.com> commit ec326d3 Author: Georgi Gerganov <ggerganov@gmail.com> Date: Wed Jul 5 20:44:11 2023 +0300 ggml : fix bug introduced in LostRuins#1237 commit 1b6efea Author: Georgi Gerganov <ggerganov@gmail.com> Date: Wed Jul 5 20:20:05 2023 +0300 tests : fix test-grad0 commit 1b107b8 Author: Stephan Walter <stephan@walter.name> Date: Wed Jul 5 16:13:06 2023 +0000 ggml : generalize `quantize_fns` for simpler FP16 handling (LostRuins#1237) * Generalize quantize_fns for simpler FP16 handling * Remove call to ggml_cuda_mul_mat_get_wsize * ci : disable FMA for mac os actions --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> commit 8567c76 Author: Jesse Jojo Johnson <williamsaintgeorge@gmail.com> Date: Wed Jul 5 15:13:35 2023 +0000 Update server instructions for web front end (ggml-org#2103) Co-authored-by: Jesse Johnson <thatguy@jessejojojohnson.com> commit 924dd22 Author: Johannes Gäßler <johannesg@5d6.de> Date: Wed Jul 5 14:19:42 2023 +0200 Quantized dot products for CUDA mul mat vec (ggml-org#2067) commit 051c70d Author: Howard Su <howard0su@gmail.com> Date: Wed Jul 5 18:31:23 2023 +0800 llama: Don't double count the sampling time (ggml-org#2107) commit ea79e54 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Wed Jul 5 17:29:35 2023 +0800 fixed refusing to quantize some models commit 9e4475f Author: Johannes Gäßler <johannesg@5d6.de> Date: Wed Jul 5 08:58:05 2023 +0200 Fixed OpenCL offloading prints (ggml-org#2082) commit 7f0e9a7 Author: Nigel Bosch <pnigelb@gmail.com> Date: Tue Jul 4 18:33:33 2023 -0500 embd-input: Fix input embedding example unsigned int seed (ggml-org#2105) commit b472f3f Author: Georgi Gerganov <ggerganov@gmail.com> Date: Tue Jul 4 22:25:22 2023 +0300 readme : add link web chat PR commit ed9a54e Author: Georgi Gerganov <ggerganov@gmail.com> Date: Tue Jul 4 21:54:11 2023 +0300 ggml : sync latest (new ops, macros, refactoring) (ggml-org#2106) - add ggml_argmax() - add ggml_tanh() - add ggml_elu() - refactor ggml_conv_1d() and variants - refactor ggml_conv_2d() and variants - add helper macros to reduce code duplication in ggml.c commit f257fd2 Author: jwj7140 <32943891+jwj7140@users.noreply.github.com> Date: Wed Jul 5 03:06:12 2023 +0900 Add an API example using server.cpp similar to OAI. (ggml-org#2009) * add api_like_OAI.py * add evaluated token count to server * add /v1/ endpoints binding commit 7ee76e4 Author: Tobias Lütke <tobi@shopify.com> Date: Tue Jul 4 10:05:27 2023 -0400 Simple webchat for server (ggml-org#1998) * expose simple web interface on root domain * embed index and add --path for choosing static dir * allow server to multithread because web browsers send a lot of garbage requests we want the server to multithread when serving 404s for favicon's etc. To avoid blowing up llama we just take a mutex when it's invoked. * let's try this with the xxd tool instead and see if msvc is happier with that * enable server in Makefiles * add /completion.js file to make it easy to use the server from js * slightly nicer css * rework state management into session, expose historyTemplate to settings --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> commit acc111c Author: Henri Vasserman <henv@hot.ee> Date: Tue Jul 4 15:38:04 2023 +0300 Allow old Make to build server. (ggml-org#2098) Also make server build by default. Tested with Make 3.82 commit 23c7c6f Author: ZhouYuChen <zhouyuchen@naver.com> Date: Tue Jul 4 20:15:16 2023 +0800 Update Makefile: clean simple (ggml-org#2097) commit 69add28 Merge: 00e35d0 698efad Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Tue Jul 4 18:51:42 2023 +0800 Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml commit 00e35d0 Merge: fff705d f9108ba Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Tue Jul 4 18:46:40 2023 +0800 Merge branch 'concedo' into concedo_experimental commit f9108ba Author: Michael Moon <triffid.hunter@gmail.com> Date: Tue Jul 4 18:46:08 2023 +0800 Make koboldcpp.py executable on Linux (LostRuins#293) commit fff705d Merge: 784628a c6c0afd Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Tue Jul 4 18:42:02 2023 +0800 Merge remote-tracking branch 'ycros/improve-sampler-api-access' into concedo_experimental commit c6c0afd Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Tue Jul 4 18:35:03 2023 +0800 refactor to avoid code duplication commit 784628a Merge: ca9a116 309534d Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Tue Jul 4 16:38:32 2023 +0800 Merge remote-tracking branch 'ycros/improve-sampler-api-access' into concedo_experimental commit 698efad Author: Erik Scholz <Green-Sky@users.noreply.github.com> Date: Tue Jul 4 01:50:12 2023 +0200 CI: make the brew update temporarily optional. (ggml-org#2092) until they decide to fix the brew installation in the macos runners. see the open issues. eg actions/runner-images#7710 commit 14a2cc7 Author: Govlzkoy <gotope@users.noreply.github.com> Date: Tue Jul 4 07:50:00 2023 +0800 [ggml] fix index for ne03 value in ggml_cl_mul_f32 (ggml-org#2088) commit cf65429 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 3 16:56:40 2023 -0500 print cuda or opencl based on what's used commit 72c16d2 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 3 16:45:39 2023 -0500 Revert "fix my mistake that broke other arches" This reverts commit 777aed5. commit 1cf14cc Author: Henri Vasserman <henv@hot.ee> Date: Tue Jul 4 00:05:23 2023 +0300 fix server crashes (ggml-org#2076) commit 777aed5 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 3 15:53:32 2023 -0500 fix my mistake that broke other arches commit cc45a7f Author: Howard Su <howard0su@gmail.com> Date: Tue Jul 4 02:43:55 2023 +0800 Fix crash of test-tokenizer-0 under Debug build (ggml-org#2064) * Fix crash of test-tokenizer-0 under Debug build * Change per comment commit ca9a116 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Tue Jul 4 00:35:02 2023 +0800 possibly slower, but cannot use larger batches without modifying ggml library. commit bfeb347 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Mon Jul 3 21:36:42 2023 +0800 fix typos commit 55dbb91 Author: Howard Su <howard0su@gmail.com> Date: Mon Jul 3 19:58:58 2023 +0800 [llama] No need to check file version when loading vocab score (ggml-org#2079) commit d7d2e6a Author: WangHaoranRobin <56047610+WangHaoranRobin@users.noreply.github.com> Date: Mon Jul 3 05:38:44 2023 +0800 server: add option to output probabilities for completion (ggml-org#1962) * server: add option to output probabilities for completion * server: fix issue when handling probability output for incomplete tokens for multibyte character generation * server: fix llama_sample_top_k order * examples/common.h: put all bool variables in gpt_params together commit 27780a9 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 2 16:03:27 2023 -0500 rocm fixes commit f52c7d4 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 2 16:02:58 2023 -0500 Revert "rocm fixes" This reverts commit 2fe9927. commit 2fe9927 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 2 15:58:21 2023 -0500 rocm fixes commit efe7560 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 2 15:55:43 2023 -0500 Revert "move HIPBLAS definitions into ggml-cuda.h" This reverts commit bf49a93. commit 4fc0181 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 2 15:55:36 2023 -0500 Revert "move hipblas definitions to header files" This reverts commit 2741ffb. commit 89eb576 Merge: 2741ffb 3d2907d Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 2 14:44:13 2023 -0500 Merge branch 'LostRuins:concedo' into main commit 309534d Author: Ycros <18012+ycros@users.noreply.github.com> Date: Sun Jul 2 18:15:34 2023 +0000 implement sampler order, expose sampler order and mirostat in api commit 3d2907d Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Sun Jul 2 18:28:09 2023 +0800 make gptneox and gptj work with extended context too commit d6b47e6 Merge: e17c849 46088f7 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Sun Jul 2 17:26:39 2023 +0800 Merge branch 'master' into concedo_experimental commit e17c849 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Sun Jul 2 17:25:08 2023 +0800 switched to NTK aware scaling commit e19483c Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Sun Jul 2 14:55:08 2023 +0800 increase scratch for above 4096 commit 46088f7 Author: Georgi Gerganov <ggerganov@gmail.com> Date: Sun Jul 2 09:46:46 2023 +0300 ggml : fix build with OpenBLAS (close ggml-org#2066) commit b85ea58 Merge: ef3b8dc 0bc2cdf Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Sun Jul 2 14:45:25 2023 +0800 Merge branch 'master' into concedo_experimental # Conflicts: # README.md commit 2741ffb Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 1 17:07:42 2023 -0500 move hipblas definitions to header files commit bf49a93 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 1 16:38:50 2023 -0500 move HIPBLAS definitions into ggml-cuda.h commit 540f4e0 Merge: 2c3b46f eda663f Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 1 14:58:32 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 0bc2cdf Author: Johannes Gäßler <johannesg@5d6.de> Date: Sat Jul 1 21:49:44 2023 +0200 Better CUDA synchronization logic (ggml-org#2057) commit befb3a3 Author: Johannes Gäßler <johannesg@5d6.de> Date: Sat Jul 1 21:47:26 2023 +0200 Test-based VRAM scratch size + context adjustment (ggml-org#2056) commit b213227 Author: Daniel Drake <drake@endlessos.org> Date: Sat Jul 1 20:31:44 2023 +0200 cmake : don't force -mcpu=native on aarch64 (ggml-org#2063) It's currently not possible to cross-compile llama.cpp for aarch64 because CMakeLists.txt forces -mcpu=native for that target. -mcpu=native doesn't make sense if your build host is not the target architecture, and clang rejects it for that reason, aborting the build. This can be easily reproduced using the current Android NDK to build for aarch64 on an x86_64 host. If there is not a specific CPU-tuning target for aarch64 then -mcpu should be omitted completely. I think that makes sense, there is not enough variance in the aarch64 instruction set to warrant a fixed -mcpu optimization at this point. And if someone is building natively and wishes to enable any possible optimizations for the host device, then there is already the LLAMA_NATIVE option available. Fixes LostRuins#495. commit 2f8cd97 Author: Aaron Miller <apage43@ninjawhale.com> Date: Sat Jul 1 11:14:59 2023 -0700 metal : release buffers when freeing metal context (ggml-org#2062) commit 471aab6 Author: Judd <foldl@users.noreply.github.com> Date: Sun Jul 2 01:00:25 2023 +0800 convert : add support of baichuan-7b (ggml-org#2055) Co-authored-by: Judd <foldl@boxvest.com> commit 463f2f4 Author: Georgi Gerganov <ggerganov@gmail.com> Date: Sat Jul 1 19:05:09 2023 +0300 llama : fix return value of llama_load_session_file_internal (ggml-org#2022) commit cb44dbc Author: Rand Xie <randxiexyy29@gmail.com> Date: Sun Jul 2 00:02:58 2023 +0800 llama : catch llama_load_session_file_internal exceptions (ggml-org#2022) * convert checks in llama_load_session_file to throw and handle them * make llama_load_session_file_internal static * address feedbacks to avoid using exceptions commit 79f634a Author: Georgi Gerganov <ggerganov@gmail.com> Date: Sat Jul 1 18:46:00 2023 +0300 embd-input : fix returning ptr to temporary commit 04606a1 Author: Georgi Gerganov <ggerganov@gmail.com> Date: Sat Jul 1 18:45:44 2023 +0300 train : fix compile warning commit b1ca8f3 Author: Qingyou Meng <meng.qingyou@gmail.com> Date: Sat Jul 1 23:42:43 2023 +0800 ggml : disable GGML_TASK_INIT and GGML_TASK_FINALIZE by default (ggml-org#1995) Will not be scheduled unless explicitly enabled. commit 2c3b46f Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 29 18:43:43 2023 -0500 changes to fix build commit c9e1103 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 29 18:20:07 2023 -0500 Update ggml_v2-cuda-legacy.cu for ROCM commit b858fc5 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 29 17:49:39 2023 -0500 changes to work with upstream commit 69a0c25 Merge: 096f0b0 1347d3a Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 29 16:59:06 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 096f0b0 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jun 28 15:27:02 2023 -0500 revert unnecessary hipblas conditionals commit d81e81a Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jun 28 14:48:23 2023 -0500 Update Makefile hipblas nvcc correction commit 2579ecf Merge: abed427 d2034ce Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jun 25 17:50:04 2023 -0500 Merge branch 'LostRuins:concedo' into main commit abed427 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jun 24 19:16:30 2023 -0500 reorganize If statements to include proper headers commit 06c3bf0 Merge: ea6d320 8342fe8 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jun 24 16:57:20 2023 -0500 Merge branch 'LostRuins:concedo' into main commit ea6d320 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Fri Jun 23 01:53:28 2023 -0500 Update README.md commit 4d56ad8 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 22 16:19:43 2023 -0500 Update README.md commit 21f9308 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 22 15:42:05 2023 -0500 kquants_iter for hipblas and add gfx803 commit b6ff890 Merge: eb094f0 e6ddb15 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 22 12:42:09 2023 -0500 Merge branch 'LostRuins:concedo' into main commit eb094f0 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jun 21 23:59:18 2023 -0500 lowvram parameter description commit 3a5dfeb Merge: 665cc11 b1f00fa Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jun 21 16:53:03 2023 -0500 Merge branch 'LostRuins:concedo' into koboldcpp-rocm commit 665cc11 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jun 21 01:13:19 2023 -0500 add lowvram parameter commit 222cbbb Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Tue Jun 20 19:03:28 2023 -0500 add additional hipblas conditions for cublas commit e1f9581 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Tue Jun 20 16:51:59 2023 -0500 Add hip def for cuda v2 commit 3bff5c0 Merge: a7e74b3 266d47a Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Tue Jun 20 13:38:06 2023 -0500 Merge branch 'LostRuins:concedo' into koboldcpp-rocm commit a7e74b3 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jun 19 22:04:18 2023 -0500 Update README.md commit 5e99b3c Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jun 19 22:03:42 2023 -0500 Update Makefile commit 9190b17 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jun 19 21:47:10 2023 -0500 Update README.md commit 2780ea2 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jun 18 15:48:00 2023 -0500 Update Makefile commit 04a3e64 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jun 18 14:33:39 2023 -0500 remove extra line commit cccbca9 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jun 18 14:31:17 2023 -0500 attempt adding ROCM hipblas commit a44a1d4 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jun 18 14:31:01 2023 -0500 attempt adding ROCM hipblas commit b088184 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jun 18 14:30:54 2023 -0500 attempt adding ROCM hipblas
1 parent 631b115 commit 8a8218c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+7043
-2233
lines changed

CMakeLists.txt

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -41,9 +41,10 @@ if (NOT MSVC)
4141
endif()
4242

4343
# 3rd party libs
44-
option(LLAMA_CUBLAS "llama: use cuBLAS" ON)
44+
option(LLAMA_CUBLAS "llama: use cuBLAS" OFF)
4545
set(LLAMA_CUDA_DMMV_X "32" CACHE STRING "llama: x stride for dmmv CUDA kernels")
4646
set(LLAMA_CUDA_DMMV_Y "1" CACHE STRING "llama: y block size for dmmv CUDA kernels")
47+
set(LLAMA_CUDA_MMV_Y "1" CACHE STRING "llama: y block size for mmv CUDA kernels")
4748
option(LLAMA_CUDA_DMMV_F16 "llama: use 16 bit floats for dmmv CUDA kernels" OFF)
4849
set(LLAMA_CUDA_KQUANTS_ITER "2" CACHE STRING "llama: iters./thread per block for Q2_K/Q6_K")
4950
option(LLAMA_HIPBLAS "llama: use hipBLAS" OFF)
@@ -77,8 +78,11 @@ if (LLAMA_CUBLAS)
7778
set(GGML_V2_LEGACY_CUDA_SOURCES otherarch/ggml_v2-cuda-legacy.cu otherarch/ggml_v2-cuda-legacy.h)
7879

7980
add_compile_definitions(GGML_USE_CUBLAS)
81+
add_compile_definitions(GGML_CUDA_FORCE_DMMV) #non dmmv broken for me
82+
8083
add_compile_definitions(GGML_CUDA_DMMV_X=${LLAMA_CUDA_DMMV_X})
8184
add_compile_definitions(GGML_CUDA_DMMV_Y=${LLAMA_CUDA_DMMV_Y})
85+
add_compile_definitions(GGML_CUDA_MMV_Y=${LLAMA_CUDA_MMV_Y})
8286
if (LLAMA_CUDA_DMMV_F16)
8387
add_compile_definitions(GGML_CUDA_DMMV_F16)
8488
endif()
@@ -90,6 +94,15 @@ if (LLAMA_CUBLAS)
9094
set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS} CUDA::cudart CUDA::cublas CUDA::cublasLt)
9195
endif()
9296

97+
if (NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
98+
if (LLAMA_CUDA_DMMV_F16)
99+
set(CMAKE_CUDA_ARCHITECTURES "61") # needed for f16 CUDA intrinsics
100+
else()
101+
set(CMAKE_CUDA_ARCHITECTURES "52;61") # lowest CUDA 12 standard + lowest for integer intrinsics
102+
endif()
103+
endif()
104+
message(STATUS "Using CUDA architectures: ${CMAKE_CUDA_ARCHITECTURES}")
105+
93106
else()
94107
message(WARNING "cuBLAS not found")
95108
endif()
@@ -200,11 +213,6 @@ if (${CMAKE_SYSTEM_PROCESSOR} MATCHES "arm" OR ${CMAKE_SYSTEM_PROCESSOR} MATCHES
200213
if (MSVC)
201214
# TODO: arm msvc?
202215
else()
203-
if (${CMAKE_SYSTEM_PROCESSOR} MATCHES "aarch64")
204-
# Apple M1, M2, etc.
205-
# Raspberry Pi 3, 4, Zero 2 (64-bit)
206-
add_compile_options(-mcpu=native)
207-
endif()
208216
if (${CMAKE_SYSTEM_PROCESSOR} MATCHES "armv6")
209217
# Raspberry Pi 1, Zero
210218
add_compile_options(-mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access)

Makefile

Lines changed: 30 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -144,16 +144,18 @@ ifdef LLAMA_CUBLAS
144144
CUBLASLD_FLAGS = -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L$(CUDA_PATH)/targets/x86_64-linux/lib
145145
CUBLAS_OBJS = ggml-cuda.o ggml_v2-cuda.o ggml_v2-cuda-legacy.o
146146
NVCC = nvcc
147-
NVCCFLAGS = --forward-unknown-to-host-compiler -arch=native
147+
NVCCFLAGS = --forward-unknown-to-host-compiler -arch=native -DGGML_CUDA_FORCE_DMMV
148148
ifdef LLAMA_CUDA_DMMV_X
149149
NVCCFLAGS += -DGGML_CUDA_DMMV_X=$(LLAMA_CUDA_DMMV_X)
150150
else
151151
NVCCFLAGS += -DGGML_CUDA_DMMV_X=32
152152
endif # LLAMA_CUDA_DMMV_X
153153
ifdef LLAMA_CUDA_DMMV_Y
154+
NVCCFLAGS += -DGGML_CUDA_MMV_Y=$(LLAMA_CUDA_MMV_Y)
154155
NVCCFLAGS += -DGGML_CUDA_DMMV_Y=$(LLAMA_CUDA_DMMV_Y)
155156
else
156157
NVCCFLAGS += -DGGML_CUDA_DMMV_Y=1
158+
NVCCFLAGS += -DGGML_CUDA_MMV_Y=1
157159
endif # LLAMA_CUDA_DMMV_Y
158160
ifdef LLAMA_CUDA_DMMV_F16
159161
NVCCFLAGS += -DGGML_CUDA_DMMV_F16
@@ -175,23 +177,40 @@ ifdef LLAMA_HIPBLAS
175177
ROCM_PATH ?= /opt/rocm
176178
CC := $(ROCM_PATH)/llvm/bin/clang
177179
CXX := $(ROCM_PATH)/llvm/bin/clang++
178-
GPU_TARGETS = gfx803 gfx900 gfx906 gfx908 gfx90a gfx1030
180+
GPU_TARGETS = gfx803 gfx900 gfx906 gfx908 gfx90a gfx1030 gfx1100
179181
LLAMA_CUDA_DMMV_X ?= 64
180-
LLAMA_CUDA_DMMV_Y ?= 2
182+
LLAMA_CUDA_MMV_Y ?= 2
183+
LLAMA_CUDA_FORCE_DMMV = true
181184
CFLAGS += -DGGML_USE_HIPBLAS -DGGML_USE_CUBLAS $(shell $(ROCM_PATH)/bin/hipconfig -C)
182185
CXXFLAGS += -DGGML_USE_HIPBLAS -DGGML_USE_CUBLAS $(shell $(ROCM_PATH)/bin/hipconfig -C)
183186
LDFLAGS += -L/opt/rocm/lib -Wl,-rpath=$(ROCM_PATH)/lib -lhipblas -lamdhip64
184187
OBJS += ggml-cuda.o ggml_v2-cuda.o ggml_v2-cuda-legacy.o
185188

189+
ifdef LLAMA_CUDA_DMMV_X
190+
CXXFLAGS += -DGGML_CUDA_DMMV_X=$(LLAMA_CUDA_DMMV_X)
191+
else
192+
CXXFLAGS += -DGGML_CUDA_DMMV_X=32
193+
endif
194+
ifeq ($(LLAMA_CUDA_FORCE_DMMV), true)
195+
CXXFLAGS += -DGGML_CUDA_FORCE_DMMV
196+
endif
197+
ifdef LLAMA_CUDA_MMV_Y
198+
CXXFLAGS += -DGGML_CUDA_MMV_Y=$(LLAMA_CUDA_MMV_Y)
199+
else ifdef LLAMA_CUDA_DMMV_Y
200+
CXXFLAGS += -DGGML_CUDA_MMV_Y=$(LLAMA_CUDA_DMMV_Y) # for backwards compatibility
201+
else
202+
CXXFLAGS += -DGGML_CUDA_MMV_Y=1
203+
endif
204+
186205
ifdef LLAMA_CUDA_KQUANTS_ITER
187206
CXXFLAGS += -DK_QUANTS_PER_ITERATION=$(LLAMA_CUDA_KQUANTS_ITER)
188207
else
189208
CXXFLAGS += -DK_QUANTS_PER_ITERATION=2
190209
endif
191210

192-
ggml-cuda.o: CXXFLAGS += $(addprefix --offload-arch=,$(GPU_TARGETS)) \
193-
-DGGML_CUDA_DMMV_X=$(LLAMA_CUDA_DMMV_X) \
194-
-DGGML_CUDA_DMMV_Y=$(LLAMA_CUDA_DMMV_Y)
211+
ggml-cuda.o: CXXFLAGS += $(addprefix --offload-arch=,$(GPU_TARGETS))
212+
213+
195214
# DGGML_CUDA_DMMV_F16 does not currently work with AMD.
196215
ggml-cuda.o: ggml-cuda.cu ggml-cuda.h
197216
$(CXX) $(CXXFLAGS) -x hip -c -o $@ $<
@@ -259,11 +278,11 @@ else
259278
OPENBLAS_NOAVX2_BUILD = $(CXX) $(CXXFLAGS) $^ $(ARCH_ADD) -lopenblas -shared -o $@.so $(LDFLAGS)
260279
endif
261280
ifdef LLAMA_CLBLAST
262-
ifeq ($(UNAME_S),Darwin)
263-
CLBLAST_BUILD = $(CXX) $(CXXFLAGS) $^ -lclblast -framework OpenCL $(ARCH_ADD) -lopenblas -shared -o $@.so $(LDFLAGS)
264-
else
265-
CLBLAST_BUILD = $(CXX) $(CXXFLAGS) $^ -lclblast -lOpenCL $(ARCH_ADD) -lopenblas -shared -o $@.so $(LDFLAGS)
266-
endif
281+
ifeq ($(UNAME_S),Darwin)
282+
CLBLAST_BUILD = $(CXX) $(CXXFLAGS) $^ -lclblast -framework OpenCL $(ARCH_ADD) -lopenblas -shared -o $@.so $(LDFLAGS)
283+
else
284+
CLBLAST_BUILD = $(CXX) $(CXXFLAGS) $^ -lclblast -lOpenCL $(ARCH_ADD) -lopenblas -shared -o $@.so $(LDFLAGS)
285+
endif
267286
endif
268287

269288
ifdef LLAMA_CUBLAS

README.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,22 @@
1-
# koboldcpp
1+
# koboldcpp-ROCM
22

3+
To install, run
4+
```make LLAMA_HIPBLAS=1```
5+
To use ROCM, set GPU layers with --gpulayers when starting koboldcpp
6+
Original [llama.cpp rocm port](https://github.com/ggerganov/llama.cpp/pull/1087) by SlyEcho, ported to koboldcpp by yellowrosecx
7+
8+
Comparison with OpenCL using 6800xt
9+
| Model | Offloading Method | Time Taken - Processing 593 tokens| Time Taken - Generating 200 tokens| Total Time | Perf. Diff.
10+
|-----------------|----------------------------|--------------------|--------------------|------------|---|
11+
| Robin 7b q6_K |CLBLAST 6-t, All Layers on GPU | 6.8s (11ms/T) | 12.0s (60ms/T) | 18.7s (10.7T/s) | 1x
12+
| Robin 7b q6_K |ROCM 1-t, All Layers on GPU | 1.4s (2ms/T) | 5.5s (28ms/T) | 6.9s (29.1T/s)| **2.71x**
13+
| Robin 13b q5_K_M |CLBLAST 6-t, All Layers on GPU | 10.9s (18ms/T) | 16.7s (83ms/T) | 27.6s (7.3T/s) | 1x
14+
| Robin 13b q5_K_M |ROCM 1-t, All Layers on GPU | 2.4s (4ms/T) | 7.8s (39ms/T) | 10.2s (19.6T/s)| **2.63x**
15+
| Robin 33b q4_K_S |CLBLAST 6-t, 46/63 Layers on GPU | 23.2s (39ms/T) | 48.6s (243ms/T) | 71.9s (2.8T/s) | 1x
16+
| Robin 33b q4_K_S |CLBLAST 6-t, 50/63 Layers on GPU | 25.5s (43ms/T) | 44.6s (223ms/T) | 70.0s (2.9T/s) | 1x
17+
| Robin 33b q4_K_S |ROCM 6-t, 46/63 Layers on GPU | 14.6s (25ms/T) | 44.1s (221ms/T) | 58.7s (3.4T/s)| **1.19x**
18+
19+
--------
320
A self contained distributable from Concedo that exposes llama.cpp function bindings, allowing it to be used via a simulated Kobold API endpoint.
421

522
What does it mean? You get llama.cpp with a fancy UI, persistent stories, editing tools, save formats, memory, world info, author's note, characters, scenarios and everything Kobold and Kobold Lite have to offer. In a tiny package around 20 MB in size, excluding model weights.

convert.py

Lines changed: 42 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -136,7 +136,7 @@ def find_n_mult(n_ff: int, n_embd: int) -> int:
136136
calc_ff = (((8*n_embd) // 3 + n_mult - 1) // n_mult)*n_mult
137137
if calc_ff == n_ff:
138138
return n_mult
139-
return 1
139+
raise Exception(f"failed to find n_mult for (n_ff={n_ff}, n_embd={n_embd}).")
140140

141141
@dataclass
142142
class Params:
@@ -154,9 +154,15 @@ def guessed(model: 'LazyModel') -> 'Params':
154154
# try transformer naming first
155155
if "model.layers.0.self_attn.q_proj.weight" in model:
156156
n_layer=next(i for i in itertools.count() if f"model.layers.{i}.self_attn.q_proj.weight" not in model)
157+
elif "model.layers.0.self_attn.W_pack.weight" in model: # next: try baichuan naming
158+
n_layer=next(i for i in itertools.count() if f"model.layers.{i}.self_attn.W_pack.weight" not in model)
157159
else:
158160
n_layer=next(i for i in itertools.count() if f"layers.{i}.attention.wq.weight" not in model)
159161

162+
if n_layer < 1:
163+
raise Exception("failed to guess 'n_layer'. This model is unknown or unsupported.\n"
164+
"Suggestion: provide 'config.json' of the model in the same directory containing model files.")
165+
160166
n_head=n_embd // 128 # guessed
161167

162168
return Params(
@@ -321,6 +327,10 @@ def astype(self, data_type: DataType) -> 'Tensor': ...
321327
@abstractmethod
322328
def permute(self, n_head: int) -> 'Tensor': ...
323329
@abstractmethod
330+
def permute_part(self, n_part: int, n_head: int) -> 'UnquantizedTensor': ...
331+
@abstractmethod
332+
def part(self, n_part: int) -> 'UnquantizedTensor': ...
333+
@abstractmethod
324334
def to_ggml(self) -> 'GGMLCompatibleTensor': ...
325335

326336

@@ -345,6 +355,14 @@ def astype(self, data_type: DataType) -> Tensor:
345355
def to_ggml(self) -> 'UnquantizedTensor':
346356
return self
347357

358+
def permute_part(self, n_part: int, n_head: int) -> 'UnquantizedTensor':
359+
r = self.ndarray.shape[0] // 3
360+
return UnquantizedTensor(permute(self.ndarray[r * n_part : r * n_part + r, ...], n_head))
361+
362+
def part(self, n_part: int) -> 'UnquantizedTensor':
363+
r = self.ndarray.shape[0] // 3
364+
return UnquantizedTensor(self.ndarray[r * n_part : r * n_part + r, ...])
365+
348366
def permute(self, n_head: int) -> 'UnquantizedTensor':
349367
return UnquantizedTensor(permute(self.ndarray, n_head))
350368

@@ -642,6 +660,19 @@ def load() -> Tensor:
642660
return lazy_tensor.load().permute(n_head)
643661
return LazyTensor(load, lazy_tensor.shape, lazy_tensor.data_type, f'permute({n_head}) ' + lazy_tensor.description)
644662

663+
def permute_part_lazy(lazy_tensor: LazyTensor, n_part: int, n_head: int) -> LazyTensor:
664+
def load() -> Tensor:
665+
return lazy_tensor.load().permute_part(n_part, n_head)
666+
s = lazy_tensor.shape.copy()
667+
s[0] = s[0] // 3
668+
return LazyTensor(load, s, lazy_tensor.data_type, f'permute({n_head}) ' + lazy_tensor.description)
669+
670+
def part_lazy(lazy_tensor: LazyTensor, n_part: int) -> LazyTensor:
671+
def load() -> Tensor:
672+
return lazy_tensor.load().part(n_part)
673+
s = lazy_tensor.shape.copy()
674+
s[0] = s[0] // 3
675+
return LazyTensor(load, s, lazy_tensor.data_type, 'part ' + lazy_tensor.description)
645676

646677
def convert_transformers_to_orig(model: LazyModel, params: Params) -> LazyModel:
647678
out: LazyModel = {}
@@ -650,11 +681,17 @@ def convert_transformers_to_orig(model: LazyModel, params: Params) -> LazyModel:
650681
out["output.weight"] = model["lm_head.weight"]
651682

652683
for i in itertools.count():
653-
if f"model.layers.{i}.self_attn.q_proj.weight" not in model:
684+
if f"model.layers.{i}.self_attn.q_proj.weight" in model:
685+
out[f"layers.{i}.attention.wq.weight"] = permute_lazy(model[f"model.layers.{i}.self_attn.q_proj.weight"], params.n_head)
686+
out[f"layers.{i}.attention.wk.weight"] = permute_lazy(model[f"model.layers.{i}.self_attn.k_proj.weight"], params.n_head)
687+
out[f"layers.{i}.attention.wv.weight"] = model[f"model.layers.{i}.self_attn.v_proj.weight"]
688+
elif f"model.layers.{i}.self_attn.W_pack.weight" in model:
689+
out[f"layers.{i}.attention.wq.weight"] = permute_part_lazy(model[f"model.layers.{i}.self_attn.W_pack.weight"], 0, params.n_head)
690+
out[f"layers.{i}.attention.wk.weight"] = permute_part_lazy(model[f"model.layers.{i}.self_attn.W_pack.weight"], 1, params.n_head)
691+
out[f"layers.{i}.attention.wv.weight"] = part_lazy(model[f"model.layers.{i}.self_attn.W_pack.weight"], 2)
692+
else:
654693
break
655-
out[f"layers.{i}.attention.wq.weight"] = permute_lazy(model[f"model.layers.{i}.self_attn.q_proj.weight"], params.n_head)
656-
out[f"layers.{i}.attention.wk.weight"] = permute_lazy(model[f"model.layers.{i}.self_attn.k_proj.weight"], params.n_head)
657-
out[f"layers.{i}.attention.wv.weight"] = model[f"model.layers.{i}.self_attn.v_proj.weight"]
694+
658695
out[f"layers.{i}.attention.wo.weight"] = model[f"model.layers.{i}.self_attn.o_proj.weight"]
659696

660697
out[f"layers.{i}.feed_forward.w1.weight"] = model[f"model.layers.{i}.mlp.gate_proj.weight"]

examples/alpaca.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
cd `dirname $0`
88
cd ..
99

10-
./main -m ./models/ggml-alpaca-7b-q4.bin \
10+
./main -m ./models/alpaca.13b.ggmlv3.q8_0.bin \
1111
--color \
1212
-f ./prompts/alpaca.txt \
1313
--ctx_size 2048 \

examples/common.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ struct gpt_params {
3131
int32_t n_gpu_layers = 0; // number of layers to store in VRAM
3232
int32_t main_gpu = 0; // the GPU that is used for scratch and small tensors
3333
float tensor_split[LLAMA_MAX_DEVICES] = {0}; // how split tensors should be distributed across GPUs
34-
bool low_vram = 0; // if true, reduce VRAM usage at the cost of performance
34+
int32_t n_probs = 0; // if greater than 0, output the probabilities of top n_probs tokens.
3535

3636
// sampling parameters
3737
std::unordered_map<llama_token, float> logit_bias; // logit bias for specific tokens
@@ -59,6 +59,7 @@ struct gpt_params {
5959
std::string lora_adapter = ""; // lora adapter path
6060
std::string lora_base = ""; // base model path for the lora adapter
6161

62+
bool low_vram = false; // if true, reduce VRAM usage at the cost of performance
6263
bool memory_f16 = true; // use f16 instead of f32 for memory kv
6364
bool random_prompt = false; // do not randomize prompt if none provided
6465
bool use_color = false; // use color to distinguish generations and inputs

examples/embd-input/embd-input-lib.cpp

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ struct MyModel* create_mymodel(int argc, char ** argv) {
2929

3030
fprintf(stderr, "%s: build = %d (%s)\n", __func__, BUILD_NUMBER, BUILD_COMMIT);
3131

32-
if (params.seed < 0) {
32+
if (params.seed == LLAMA_DEFAULT_SEED) {
3333
params.seed = time(NULL);
3434
}
3535
fprintf(stderr, "%s: seed = %d\n", __func__, params.seed);
@@ -210,9 +210,12 @@ llama_token sampling_id(struct MyModel* mymodel) {
210210
const char * sampling(struct MyModel * mymodel) {
211211
llama_context * ctx = mymodel->ctx;
212212
int id = sampling_id(mymodel);
213-
std::string ret;
214-
if (id == llama_token_eos()) ret = "</s>";
215-
else ret = llama_token_to_str(ctx, id);
213+
static std::string ret;
214+
if (id == llama_token_eos()) {
215+
ret = "</s>";
216+
} else {
217+
ret = llama_token_to_str(ctx, id);
218+
}
216219
eval_id(mymodel, id);
217220
return ret.c_str();
218221
}

examples/embd-input/embd-input.h

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@
55
#include "llama.h"
66
#include "build-info.h"
77

8-
98
extern "C" {
109

1110
typedef struct MyModel {
@@ -14,14 +13,13 @@ typedef struct MyModel {
1413
int n_past = 0;
1514
} MyModel;
1615

17-
1816
struct MyModel* create_mymodel(int argc, char ** argv);
1917

2018
bool eval_float(void* model, float* input, int N);
2119
bool eval_tokens(void* model, std::vector<llama_token> tokens);
2220
bool eval_id(struct MyModel* mymodel, int id);
2321
bool eval_string(struct MyModel* mymodel, const char* str);
24-
const char* sampling(struct MyModel* mymodel);
22+
const char * sampling(struct MyModel* mymodel);
2523
llama_token sampling_id(struct MyModel* mymodel);
2624
void free_mymodel(struct MyModel* mymodel);
2725

examples/embedding/embedding.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ int main(int argc, char ** argv) {
1818
params.embedding = true;
1919

2020
if (params.n_ctx > 2048) {
21-
fprintf(stderr, "%s: warning: model does not support context sizes greater than 2048 tokens (%d specified);"
21+
fprintf(stderr, "%s: warning: model might not support context sizes greater than 2048 tokens (%d specified);"
2222
"expect poor results\n", __func__, params.n_ctx);
2323
}
2424

examples/main/main.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ int main(int argc, char ** argv) {
8585
}
8686

8787
if (params.n_ctx > 2048) {
88-
fprintf(stderr, "%s: warning: model does not support context sizes greater than 2048 tokens (%d specified);"
88+
fprintf(stderr, "%s: warning: model might not support context sizes greater than 2048 tokens (%d specified);"
8989
"expect poor results\n", __func__, params.n_ctx);
9090
} else if (params.n_ctx < 8) {
9191
fprintf(stderr, "%s: warning: minimum context size is 8, using minimum size.\n", __func__);

0 commit comments

Comments
 (0)