3
3
Speech-to-Text in golang. This is an early development version.
4
4
5
5
* ` cmd ` contains an OpenAI-API compatible server
6
- * ` pkg ` contains the ` whisper ` service and http gateway
6
+ * ` pkg ` contains the ` whisper ` service and client
7
7
* ` sys ` contains the ` whisper ` bindings to the ` whisper.cpp ` library
8
8
* ` third_party ` is a submodule for the whisper.cpp source
9
9
10
10
## Running
11
11
12
+ (Note: Docker images are not created yet - this is some forward planning!)
13
+
12
14
There are docker images for arm64 and amd64 (Intel). The arm64 image is built for
13
15
Jetson GPU support specifically, but it will also run on Raspberry Pi's.
14
16
15
17
In order to utilize a NVIDIA GPU, you'll need to install the
16
18
[ NVIDIA Container Toolkit] ( https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html ) first.
17
19
18
20
A docker volume should be created called "whisper" can be used for storing the Whisper language
19
- models. You can see which models are available to download locally [ here] ( https://huggingface.co/ggerganov/whisper.cpp ) . The following command will run the server on port 8080:
21
+ models. You can see which models are available to download locally [ here] ( https://huggingface.co/ggerganov/whisper.cpp ) .
22
+ The following command will run the server on port 8080:
20
23
21
24
``` bash
22
25
docker run \
@@ -47,33 +50,49 @@ curl -X GET localhost:8080/v1/models
47
50
To delete a model, you can use the following command:
48
51
49
52
``` bash
50
- curl -X DELETE localhost:8080/v1/models/ggml-tiny.en-q8_0.bin
53
+ curl -X DELETE localhost:8080/v1/models/ggml-tiny.en-q8_0
51
54
```
52
55
53
- And to transcribe an audio file, you can use the following command:
56
+ To transcribe a media file into it's original language , you can use the following command:
54
57
55
58
``` bash
56
- curl -F " model=ggml-tiny.en-q8_0.bin" -F " file=@samples/jfk.wav" -F " language=en" localhost:8080/v1/audio/transcriptions
59
+ curl -F " model=ggml-tiny.en-q8_0" -F " file=@samples/jfk.wav" localhost:8080/v1/audio/transcriptions
60
+ ```
61
+
62
+ To translate a media file into a different language, you can use the following command:
63
+
64
+ ``` bash
65
+ curl -F " model=ggml-tiny.en-q8_0" -F " file=@samples/de-podcast.wav" -F " language=en" localhost:8080/v1/audio/transcriptions
57
66
```
58
67
59
- Right now there's a limitation on the files: they must be mono WAV files at 16K sample rate.
60
68
There's more information on the API [ here] ( doc/API.md ) .
61
69
62
70
## Building
63
71
72
+ The build dependencies are:
73
+
74
+ * Go 1.22
75
+ * C++ compiler
76
+ * FFmpeg 6.1 libraries (see [ here] ( doc/build.md ) for more information)
77
+ * For CUDA, you'll need the CUDA toolkit including the ` nvcc ` compiler
78
+
64
79
If you want to build the server yourself for your specific combination of hardware,
65
- you can use the ` Makefile ` in the root directory. You'll need go 1.22, ` make ` and
80
+ you can use the ` Makefile ` in the root directory. You'll need go 1.22, ` make ` and
66
81
a C++ compiler to build this project. The following ` Makefile ` targets can be used:
67
82
68
- * ` make server ` - creates the server binary, and places it in the ` build ` directory
69
- * ` DOCKER_REGISTRY=docker.io/user make docker ` - builds a docker container with the server binary
83
+ * ` make server ` - creates the server binary, and places it in the ` build ` directory. Should
84
+ link to Metal on macOS
85
+ * ` GGML_CUDA=1 make server ` - creates the server binary linked to CUDA, and places it
86
+ in the ` build ` directory. Should work for amd64 and arm64 (Jetson) platforms
87
+ * ` DOCKER_REGISTRY=docker.io/user make docker ` - builds a docker container with the
88
+ server binary, tagged to a specific registry
70
89
71
90
See all the other targets in the ` Makefile ` for more information.
72
91
73
92
## Status
74
93
75
- Still in development. It only accepts mono WAV files at 16K sample rate, for example. It also
76
- occasionally crashes, and the API is not fully implemented .
94
+ Still in development. See this [ issue ] ( https://github.com/mutablelogic/go-whisper/issues/1 ) for
95
+ remaining tasks to be completed .
77
96
78
97
## Contributing & Distribution
79
98
@@ -84,7 +103,7 @@ The license is Apache 2 so feel free to redistribute. Redistributions in either
84
103
code or binary form must reproduce the copyright notice, and please link back to this
85
104
repository for more information:
86
105
87
- > __ go-media __ \
106
+ > __ go-whisper __ \
88
107
> [ https://github.com/mutablelogic/go-whisper/ ] ( https://github.com/mutablelogic/go-whisper/ ) \
89
108
> Copyright (c) 2023-2024 David Thorpe, All rights reserved.
90
109
>
0 commit comments