Skip to content

Commit 0aa5997

Browse files
committed
Updated body
1 parent 001871d commit 0aa5997

File tree

1 file changed

+90
-32
lines changed

1 file changed

+90
-32
lines changed

doc/API.md

Lines changed: 90 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,54 +1,112 @@
1-
# Draft API
1+
# Whisper server API
22

3-
(From OpenAPI docs)
3+
Based on OpenAPI docs
44

5-
## Create transcription - upload
5+
## Ping
66

77
```html
8-
POST /v1/audio/transcriptions
8+
GET /v1/ping
99
```
1010

11-
Transcribes audio into the input language.
11+
Returns a OK status to indicate the API is up and running.
12+
13+
## Models
14+
15+
### List Models
16+
17+
```html
18+
GET /v1/models
19+
```
1220

13-
**body - Required**
14-
The audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.
21+
Returns a list of available models. Example response:
1522

16-
**model - string - Required**
17-
ID of the model to use.
23+
```json
24+
{
25+
"object": "list",
26+
"models": [
27+
{
28+
"id": "ggml-large-v3",
29+
"object": "model",
30+
"path": "ggml-large-v3.bin",
31+
"created": 1722090121
32+
},
33+
{
34+
"id": "ggml-medium-q5_0",
35+
"object": "model",
36+
"path": "ggml-medium-q5_0.bin",
37+
"created": 1722081999
38+
}
39+
]
40+
}
41+
```
1842

19-
**language - string - Optional**
20-
The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.
43+
### Download Model
2144

22-
**prompt - string - Optional**
23-
An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.
45+
```html
46+
POST /v1/models
47+
POST /v1/models?stream={bool}
48+
```
2449

25-
**response_format - string - Optional**
26-
Defaults to json
27-
The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.
50+
The request should be a application/json, multipart/form-data or application/x-www-form-urlencoded request with the following fields:
2851

29-
**temperature - number - Optional**
30-
Defaults to 0
31-
The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.
52+
```json
53+
{
54+
"path": "ggml-large-v3.bin"
55+
}
56+
```
3257

33-
**timestamp_granularities[] - array - Optional**
34-
Defaults to segment
35-
The timestamp granularities to populate for this transcription. response_format must be set verbose_json to use timestamp granularities. Either or both of these options are supported: word, or segment. Note: There is no additional latency for segment timestamps, but generating word timestamps incurs additional latency.
58+
Downloads a model from remote huggingface repository. If the optional `stream` argument is true,
59+
the progress is streamed back to the client as a series of [text/event-stream](https://html.spec.whatwg.org/multipage/server-sent-events.html) events.
3660

37-
**Returns**
38-
The transcription object or a verbose transcription object.
61+
If the model is already downloaded, a 200 OK status is returned. If the model was downloaded, a 201 Created status is returned.
3962

40-
### Example request
63+
### Delete Model
4164

42-
```bash
43-
curl https://localhost/v1/audio/transcriptions \
44-
-H "Authorization: Bearer $OPENAI_API_KEY" \
45-
-H "Content-Type: multipart/form-data" \
46-
-F file="@/path/to/file/audio.mp3" \
47-
-F model="whisper-1"
65+
```html
66+
DELETE /v1/models/{model-id}
4867
```
4968

69+
Deletes a model by it's ID. If the model is deleted, a 200 OK status is returned.
70+
71+
## Transcription and translation with file upload
72+
73+
### Transcription
74+
75+
This endpoint's purpose is to transcribe media files into text, in the language of the media file.
76+
77+
```html
78+
POST /v1/audio/transcriptions
79+
POST /v1/audio/transcriptions?stream={bool}
80+
```
81+
82+
The request should be a multipart/form-data request with the following fields:
83+
5084
```json
5185
{
52-
"text": "Imagine the wildest idea that you've ever had, and you're curious about how it might scale to something that's a 100, a 1,000 times bigger. This is a place where you can get to do that."
86+
"model": "<model-id>",
87+
"file": "<binary data>",
88+
"language": "<language-code>",
89+
"response_format": "<response-format>",
5390
}
5491
```
92+
93+
Transcribes audio into the input language.
94+
95+
`file` (required) The audio file object (not file name) to transcribe. This can be audio or video, and the format is auto-detected. The "best" audio stream is selected from the file, and the audio is converted to 16 kHz mono PCM format during transcription.
96+
97+
`model-id` (required) ID of the model to use. This should have previously been downloaded.
98+
99+
`language` (optional) The language of the input audio in ISO-639-1 format. If not set, then the language is auto-detected.
100+
101+
`response_format` (optional, defaults to `json`). The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.
102+
103+
If the optional `stream` argument is true, the segments of the transcription are returned as a series of [text/event-stream](https://html.spec.whatwg.org/multipage/server-sent-events.html) events. Otherwise, the full transcription is returned in the response body.
104+
105+
### Translation
106+
107+
This is the same as transcription (above) except that the `language` parameter is not optional, and should be the language to translate the audio into.
108+
109+
```html
110+
POST /v1/audio/translations
111+
POST /v1/audio/translations?stream={bool}
112+
```

0 commit comments

Comments
 (0)