Releases · tangledgroup/llama-cpp-cffi

16 Aug 12:31

mtasic85

v0.1.12

ba5a3a9

v0.1.12

Added:
- Build vulkan_1_x for general GPU.
- Build cuda 12.4.1 as default.

Changed:
- Renamed examples for TinyLlama (chat, tool calling) and OpenAI.
- Updated demo models definitions.
- Updated examples (chat, tool calling).
- get_special_tokens not supports parameter force_standard_special_tokens: bool=False which bypasses tokenizer's special tokens with standard/common ones.
- Build cuda 12.5.1 as additional build target but packaged on PyPI.
- Build cuda 12.6 as additional build target but packaged on PyPI.
- Build openblas as additional build target but packaged on PyPI.

Fixed:
- Handle Options.no_display_prompt on Python side.

Assets 20

12 Aug 07:31

mtasic85

v0.1.11

b9f8d45

v0.1.11

Changed:
- openai: allow import of routes and v1_chat_completions handler.
- examples/demo_0.py, tool calling example

Assets 20

30 Jul 12:35

mtasic85

v0.1.10

bce8bfb

v0.1.10

Added:
- In openai, support for prompt and extra_body. Reference: https://github.com/openai/openai-python/blob/195c05a64d39c87b2dfdf1eca2d339597f1fce03/src/openai/resources/completions.py#L41
- Pass llama-cli options to openai.
- util module with is_cuda_available function.
- openai supports both prompt and messages. Reference: https://github.com/openai/openai-python/blob/195c05a64d39c87b2dfdf1eca2d339597f1fce03/src/openai/resources/completions.py#L45

Assets 16

30 Jul 06:47

mtasic85

v0.1.9

ec2aef5

v0.1.9

Added:
- Support for default CPU tinyBLAS (llamafile, sgemm) builds.
- Support for CPU OpenBLAS (GGML_OPENBLAS) builds.

Changed:
- Build scripts now have separate step/function cuda_12_5_1_setup which setups CUDA 12.5.1 env for build-time.

Fixed:
- Stop thread in llama_generate on GeneratorExit.

Removed:
- callback parameter in llama_generate and dependent functions.

Assets 16

27 Jul 15:46

mtasic85

v0.1.8

15a8ef7

v0.1.8

Added:
- Model.tokenizer_hf_repo as optional in case when Model.creator_hf_repo cannot be used to tokenize / format prompt/messages.

Assets 16

26 Jul 14:28

mtasic85

v0.1.7

c8465f5

v0.1.7

Added:
- Support for stop tokens/words.

Changed:
- llama/llama_cli.py unified CPU and CUDA 12.5 modules into single module.

Removed:
- Removed separate examples for CPU and CUDA 12.5 modules.

Assets 16

25 Jul 07:26

mtasic85

v0.1.6

9dd2083

v0.1.6

Changed:
- Updated huggingface-hub.

Fixed:
- llama.__init__ now correctly imports submodules and handles CPU and CUDA backends.
- OpenAI: ctx_size: int = config.max_position_embeddings if max_tokens is None else max_tokens.

Assets 16

24 Jul 06:38

mtasic85

v0.1.5

be08b75

v0.1.5

Fixed:
- Build for linux, upx uses best compression option, 7z uses more aggressive compression.
- Do not use UPX for shared/dynamic library compression.

Assets 16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: tangledgroup/llama-cpp-cffi

v0.1.12

Uh oh!

v0.1.11

Uh oh!

v0.1.10

Uh oh!

v0.1.9

Uh oh!

v0.1.8

Uh oh!

v0.1.7

Uh oh!

v0.1.6

Uh oh!

v0.1.5

Uh oh!