Skip to content

Commit 0803dcb

Browse files
authored
Merge pull request #40 from mutablelogic/v1
Updated documentation
2 parents d59695d + 07d04df commit 0803dcb

File tree

2 files changed

+40
-23
lines changed

2 files changed

+40
-23
lines changed

README.md

Lines changed: 31 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,20 +3,23 @@
33
Speech-to-Text in golang. This is an early development version.
44

55
* `cmd` contains an OpenAI-API compatible server
6-
* `pkg` contains the `whisper` service and http gateway
6+
* `pkg` contains the `whisper` service and client
77
* `sys` contains the `whisper` bindings to the `whisper.cpp` library
88
* `third_party` is a submodule for the whisper.cpp source
99

1010
## Running
1111

12+
(Note: Docker images are not created yet - this is some forward planning!)
13+
1214
There are docker images for arm64 and amd64 (Intel). The arm64 image is built for
1315
Jetson GPU support specifically, but it will also run on Raspberry Pi's.
1416

1517
In order to utilize a NVIDIA GPU, you'll need to install the
1618
[NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) first.
1719

1820
A docker volume should be created called "whisper" can be used for storing the Whisper language
19-
models. You can see which models are available to download locally [here](https://huggingface.co/ggerganov/whisper.cpp). The following command will run the server on port 8080:
21+
models. You can see which models are available to download locally [here](https://huggingface.co/ggerganov/whisper.cpp).
22+
The following command will run the server on port 8080:
2023

2124
```bash
2225
docker run \
@@ -47,33 +50,49 @@ curl -X GET localhost:8080/v1/models
4750
To delete a model, you can use the following command:
4851

4952
```bash
50-
curl -X DELETE localhost:8080/v1/models/ggml-tiny.en-q8_0.bin
53+
curl -X DELETE localhost:8080/v1/models/ggml-tiny.en-q8_0
5154
```
5255

53-
And to transcribe an audio file, you can use the following command:
56+
To transcribe a media file into it's original language, you can use the following command:
5457

5558
```bash
56-
curl -F "model=ggml-tiny.en-q8_0.bin" -F "file=@samples/jfk.wav" -F "language=en" localhost:8080/v1/audio/transcriptions
59+
curl -F "model=ggml-tiny.en-q8_0" -F "file=@samples/jfk.wav" localhost:8080/v1/audio/transcriptions
60+
```
61+
62+
To translate a media file into a different language, you can use the following command:
63+
64+
```bash
65+
curl -F "model=ggml-tiny.en-q8_0" -F "file=@samples/de-podcast.wav" -F "language=en" localhost:8080/v1/audio/transcriptions
5766
```
5867

59-
Right now there's a limitation on the files: they must be mono WAV files at 16K sample rate.
6068
There's more information on the API [here](doc/API.md).
6169

6270
## Building
6371

72+
The build dependencies are:
73+
74+
* Go 1.22
75+
* C++ compiler
76+
* FFmpeg 6.1 libraries (see [here](doc/build.md) for more information)
77+
* For CUDA, you'll need the CUDA toolkit including the `nvcc` compiler
78+
6479
If you want to build the server yourself for your specific combination of hardware,
65-
you can use the `Makefile` in the root directory. You'll need go 1.22, `make` and
80+
you can use the `Makefile` in the root directory. You'll need go 1.22, `make` and
6681
a C++ compiler to build this project. The following `Makefile` targets can be used:
6782

68-
* `make server` - creates the server binary, and places it in the `build` directory
69-
* `DOCKER_REGISTRY=docker.io/user make docker` - builds a docker container with the server binary
83+
* `make server` - creates the server binary, and places it in the `build` directory. Should
84+
link to Metal on macOS
85+
* `GGML_CUDA=1 make server` - creates the server binary linked to CUDA, and places it
86+
in the `build` directory. Should work for amd64 and arm64 (Jetson) platforms
87+
* `DOCKER_REGISTRY=docker.io/user make docker` - builds a docker container with the
88+
server binary, tagged to a specific registry
7089

7190
See all the other targets in the `Makefile` for more information.
7291

7392
## Status
7493

75-
Still in development. It only accepts mono WAV files at 16K sample rate, for example. It also
76-
occasionally crashes, and the API is not fully implemented.
94+
Still in development. See this [issue](https://github.com/mutablelogic/go-whisper/issues/1) for
95+
remaining tasks to be completed.
7796

7897
## Contributing & Distribution
7998

@@ -84,7 +103,7 @@ The license is Apache 2 so feel free to redistribute. Redistributions in either
84103
code or binary form must reproduce the copyright notice, and please link back to this
85104
repository for more information:
86105

87-
> __go-media__\
106+
> __go-whisper__\
88107
> [https://github.com/mutablelogic/go-whisper/](https://github.com/mutablelogic/go-whisper/)\
89108
> Copyright (c) 2023-2024 David Thorpe, All rights reserved.
90109
>

doc/build.md

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
1+
# Notes on building
12

2-
# Package Config
3+
## Package Config
34

45
libwhisper.pc
56

@@ -13,16 +14,6 @@ Cflags: -I${prefix}/third_party/whisper.cpp/include -I${prefix}/third_party/whis
1314
Libs: -L${prefix}/third_party/whisper.cpp -lwhisper -lggml -lm -lstdc++
1415
```
1516

16-
libwhisper-linux.pc
17-
18-
```pkg-config
19-
prefix=/Users/djt/Projects/go-whisper/
20-
21-
Name: libwhisper-linux
22-
Description: Whisper is a C/C++ library for speech transcription, translation and diarization.
23-
Version: 0.0.0
24-
```
25-
2617
libwhisper-darwin.pc
2718

2819
```pkg-config
@@ -36,3 +27,10 @@ Libs: -framework Accelerate -framework Metal -framework Foundation -framework Co
3627

3728
I don't know what the windows one should be as I don't have a windows machine.
3829

30+
## Ubuntu 22.04
31+
32+
```bash
33+
sudo add-apt-repository -y ppa:ubuntuhandbook1/ffmpeg6
34+
sudo apt-get update
35+
sudo apt-get install -y libavcodec-dev libavdevice-dev libavfilter-dev libavutil-dev libswscale-dev libswresample-dev
36+
```

0 commit comments

Comments
 (0)