Skip to content

Commit 6c089cd

Browse files
committed
Merge remote-tracking branch 'ggerganov/master' into fix_decoding
* ggerganov/master: (40 commits) revert : cmake : set MSVC to use UTF-8 on source files (ggml-org#2346) sync : ggml ggml: fix ggml_graph_cpy undefined behavior (ggml/943) cann : fix doxy (ggml/0) vulkan : fix build (llama/0) cuda : mark BF16 CONT as unsupported ggml : fix cont with transposed tensors when one dimension is 1 (ggml/934) cmake : set MSVC to use UTF-8 on source files (ggml-org#2346) readme : remove invalid flag from Python example (ggml-org#2396) readme : fix link (ggml-org#2394) go : add beamsize/entropythold/maxcontext to context interface (ggml-org#2350) talk-llama : sync llama.cpp whisper : update FA call sync : ggml sync : vulkan (skip) (llama/0) ggml : do not crash when quantizing q4_x_x with an imatrix (llama/9192) metal : separate scale and mask from QKT in FA kernel (llama/9189) ggml : add SSM Metal kernels (llama/8546) metal : gemma2 flash attention support (llama/9159) CPU/CUDA: Gemma 2 FlashAttention support (llama/8542) ...
2 parents b2f5a0a + 5236f02 commit 6c089cd

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

68 files changed

+4777
-2447
lines changed

Makefile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -971,7 +971,8 @@ $(LIB_WHISPER): \
971971
$(CXX) $(CXXFLAGS) -shared -fPIC -o $@ $^ $(LDFLAGS)
972972

973973
$(LIB_WHISPER_S): \
974-
$(OBJ_WHISPER)
974+
$(OBJ_WHISPER) \
975+
$(OBJ_GGML)
975976
ar rcs $(LIB_WHISPER_S) $^
976977

977978
# common

README.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisp
2121
- Support for CPU-only inference
2222
- [Efficient GPU support for NVIDIA](https://github.com/ggerganov/whisper.cpp#nvidia-gpu-support-via-cublas)
2323
- [OpenVINO Support](https://github.com/ggerganov/whisper.cpp#openvino-support)
24-
- [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h)
24+
- [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/include/whisper.h)
2525

2626
Supported platforms:
2727

@@ -33,7 +33,7 @@ Supported platforms:
3333
- [x] [WebAssembly](examples/whisper.wasm)
3434
- [x] Windows ([MSVC](https://github.com/ggerganov/whisper.cpp/blob/master/.github/workflows/build.yml#L117-L144) and [MinGW](https://github.com/ggerganov/whisper.cpp/issues/168)]
3535
- [x] [Raspberry Pi](https://github.com/ggerganov/whisper.cpp/discussions/166)
36-
- [x] [docker](https://github.com/ggerganov/whisper.cpp/pkgs/container/whisper.cpp)
36+
- [x] [Docker](https://github.com/ggerganov/whisper.cpp/pkgs/container/whisper.cpp)
3737

3838
The entire high-level implementation of the model is contained in [whisper.h](include/whisper.h) and [whisper.cpp](src/whisper.cpp).
3939
The rest of the code is part of the [`ggml`](https://github.com/ggerganov/ggml) machine learning library.
@@ -55,8 +55,8 @@ Or you can even run it straight in the browser: [talk.wasm](examples/talk.wasm)
5555

5656
## Implementation details
5757

58-
- The core tensor operations are implemented in C ([ggml.h](ggml.h) / [ggml.c](ggml.c))
59-
- The transformer model and the high-level C-style API are implemented in C++ ([whisper.h](whisper.h) / [whisper.cpp](whisper.cpp))
58+
- The core tensor operations are implemented in C ([ggml.h](ggml/include/ggml.h) / [ggml.c](ggml/src/ggml.c))
59+
- The transformer model and the high-level C-style API are implemented in C++ ([whisper.h](include/whisper.h) / [whisper.cpp](src/whisper.cpp))
6060
- Sample usage is demonstrated in [main.cpp](examples/main)
6161
- Sample real-time audio transcription from the microphone is demonstrated in [stream.cpp](examples/stream)
6262
- Various other examples are available in the [examples](examples) folder
@@ -751,7 +751,7 @@ took to execute it. The results are summarized in the following Github issue:
751751

752752
[Benchmark results](https://github.com/ggerganov/whisper.cpp/issues/89)
753753

754-
Additionally a script to run whisper.cpp with different models and audio files is provided [bench.py](bench.py).
754+
Additionally a script to run whisper.cpp with different models and audio files is provided [bench.py](scripts/bench.py).
755755

756756
You can run it with the following command, by default it will run against any standard model in the models folder.
757757

@@ -798,6 +798,7 @@ For more details, see the conversion script [models/convert-pt-to-ggml.py](model
798798
- [stlukey/whispercpp.py](https://github.com/stlukey/whispercpp.py) (Cython)
799799
- [AIWintermuteAI/whispercpp](https://github.com/AIWintermuteAI/whispercpp) (Updated fork of aarnphm/whispercpp)
800800
- [aarnphm/whispercpp](https://github.com/aarnphm/whispercpp) (Pybind11)
801+
- [abdeladim-s/pywhispercpp](https://github.com/abdeladim-s/pywhispercpp) (Pybind11)
801802
- [x] R: [bnosac/audio.whisper](https://github.com/bnosac/audio.whisper)
802803
- [x] Unity: [macoron/whisper.unity](https://github.com/Macoron/whisper.unity)
803804

bindings/go/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ GGML_METAL_PATH_RESOURCES := $(abspath ../..)
1414
BUILD_DIR := build
1515
MODELS_DIR := models
1616
EXAMPLES_DIR := $(wildcard examples/*)
17-
INCLUDE_PATH := $(abspath ../..)
17+
INCLUDE_PATH := $(abspath ../../include):$(abspath ../../ggml/include)
1818
LIBRARY_PATH := $(abspath ../..)
1919

2020
ifeq ($(UNAME_S),Darwin)

bindings/go/params.go

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,18 @@ func (p *Params) SetAudioCtx(n int) {
115115
p.audio_ctx = C.int(n)
116116
}
117117

118+
func (p *Params) SetMaxContext(n int) {
119+
p.n_max_text_ctx = C.int(n)
120+
}
121+
122+
func (p *Params) SetBeamSize(n int) {
123+
p.beam_search.beam_size = C.int(n)
124+
}
125+
126+
func (p *Params) SetEntropyThold(t float32) {
127+
p.entropy_thold = C.float(t)
128+
}
129+
118130
// Set initial prompt
119131
func (p *Params) SetInitialPrompt(prompt string) {
120132
p.initial_prompt = C.CString(prompt)
@@ -145,6 +157,8 @@ func (p *Params) String() string {
145157
str += fmt.Sprintf(" duration_ms=%d", p.duration_ms)
146158
str += fmt.Sprintf(" audio_ctx=%d", p.audio_ctx)
147159
str += fmt.Sprintf(" initial_prompt=%s", C.GoString(p.initial_prompt))
160+
str += fmt.Sprintf(" entropy_thold=%f", p.entropy_thold)
161+
str += fmt.Sprintf(" beam_size=%d", p.beam_search.beam_size)
148162
if p.translate {
149163
str += " translate"
150164
}

bindings/go/pkg/whisper/context.go

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,21 @@ func (context *context) SetAudioCtx(n uint) {
125125
context.params.SetAudioCtx(int(n))
126126
}
127127

128+
// Set maximum number of text context tokens to store
129+
func (context *context) SetMaxContext(n int) {
130+
context.params.SetMaxContext(n)
131+
}
132+
133+
// Set Beam Size
134+
func (context *context) SetBeamSize(n int) {
135+
context.params.SetBeamSize(n)
136+
}
137+
138+
// Set Entropy threshold
139+
func (context *context) SetEntropyThold(t float32) {
140+
context.params.SetEntropyThold(t)
141+
}
142+
128143
// Set initial prompt
129144
func (context *context) SetInitialPrompt(prompt string) {
130145
context.params.SetInitialPrompt(prompt)

bindings/go/pkg/whisper/interface.go

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,9 @@ type Context interface {
4848
SetTokenTimestamps(bool) // Set token timestamps flag
4949
SetMaxTokensPerSegment(uint) // Set max tokens per segment (0 = no limit)
5050
SetAudioCtx(uint) // Set audio encoder context
51+
SetMaxContext(n int) // Set maximum number of text context tokens to store
52+
SetBeamSize(n int) // Set Beam Size
53+
SetEntropyThold(t float32) // Set Entropy threshold
5154
SetInitialPrompt(prompt string) // Set initial prompt
5255

5356
// Process mono audio data and return any errors.

bindings/go/whisper.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ import (
99
// CGO
1010

1111
/*
12-
#cgo LDFLAGS: -lwhisper -lm -lstdc++
12+
#cgo LDFLAGS: -lwhisper -lm -lstdc++ -fopenmp
1313
#cgo darwin LDFLAGS: -framework Accelerate -framework Metal -framework Foundation -framework CoreGraphics
1414
#include <whisper.h>
1515
#include <stdlib.h>

examples/python/whisper_processor.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ def process_audio(wav_file, model_name="base.en"):
2121
if not os.path.exists(wav_file):
2222
raise FileNotFoundError(f"WAV file not found: {wav_file}")
2323

24-
full_command = f"./main -m {model} -f {wav_file} -np -nt"
24+
full_command = f"./main -m {model} -f {wav_file} -nt"
2525

2626
# Execute the command
2727
process = subprocess.Popen(full_command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

examples/talk-llama/llama-impl.h

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,24 @@ void llama_log_callback_default(ggml_log_level level, const char * text, void *
2424
#define LLAMA_LOG_INFO(...) llama_log_internal(GGML_LOG_LEVEL_INFO , __VA_ARGS__)
2525
#define LLAMA_LOG_WARN(...) llama_log_internal(GGML_LOG_LEVEL_WARN , __VA_ARGS__)
2626
#define LLAMA_LOG_ERROR(...) llama_log_internal(GGML_LOG_LEVEL_ERROR, __VA_ARGS__)
27+
28+
//
29+
// helpers
30+
//
31+
32+
static void replace_all(std::string & s, const std::string & search, const std::string & replace) {
33+
if (search.empty()) {
34+
return;
35+
}
36+
std::string builder;
37+
builder.reserve(s.length());
38+
size_t pos = 0;
39+
size_t last_pos = 0;
40+
while ((pos = s.find(search, last_pos)) != std::string::npos) {
41+
builder.append(s, last_pos, pos - last_pos);
42+
builder.append(replace);
43+
last_pos = pos + search.length();
44+
}
45+
builder.append(s, last_pos, std::string::npos);
46+
s = std::move(builder);
47+
}

examples/talk-llama/llama-sampling.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -85,14 +85,14 @@ void llama_sample_top_k_impl(struct llama_sampling * smpl, llama_token_data_arra
8585
constexpr float bucket_low = -10.0f;
8686
constexpr float bucket_high = 10.0f;
8787
constexpr float bucket_scale = nbuckets/(bucket_high - bucket_low);
88-
constexpr float bucker_inter = -bucket_low * bucket_scale;
88+
constexpr float bucket_inter = -bucket_low * bucket_scale;
8989

9090
std::vector<int> bucket_idx(candidates->size);
9191
std::vector<int> histo(nbuckets, 0);
9292

9393
for (int i = 0; i < (int)candidates->size; ++i) {
9494
const float val = candidates->data[i].logit;
95-
int ib = int(bucket_scale * val + bucker_inter); //nbuckets * (val - bucket_low) / (bucket_high - bucket_low);
95+
int ib = int(bucket_scale * val + bucket_inter); //nbuckets * (val - bucket_low) / (bucket_high - bucket_low);
9696
ib = std::max(0, std::min(nbuckets-1, ib));
9797
bucket_idx[i] = ib;
9898
++histo[ib];

0 commit comments

Comments
 (0)