You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/guides/API.md
+215Lines changed: 215 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,22 @@
1
1
# LocalLab API Documentation
2
2
3
+
## Base URL
4
+
5
+
When making API requests, use one of the following base URLs:
6
+
7
+
-**Local development**: `http://localhost:8000`
8
+
-**Remote access**: Use your ngrok URL (e.g., `https://abcd1234.ngrok.io`)
9
+
10
+
For all examples below, replace `{BASE_URL}` with your actual base URL.
11
+
12
+
```bash
13
+
# For local development
14
+
export BASE_URL=http://localhost:8000
15
+
16
+
# For remote access via ngrok
17
+
export BASE_URL=https://your-ngrok-url.ngrok.io
18
+
```
19
+
3
20
## REST API Endpoints
4
21
5
22
### Text Generation
@@ -48,6 +65,39 @@ Generate text using the loaded model.
48
65
}
49
66
```
50
67
68
+
**Example (curl):**
69
+
70
+
```bash
71
+
# Basic generation with minimal parameters
72
+
curl -X POST "${BASE_URL}/generate" \
73
+
-H "Content-Type: application/json" \
74
+
-d '{
75
+
"prompt": "Explain quantum computing in simple terms"
76
+
}'
77
+
78
+
# Generation with all parameters
79
+
curl -X POST "${BASE_URL}/generate" \
80
+
-H "Content-Type: application/json" \
81
+
-d '{
82
+
"prompt": "Explain quantum computing in simple terms",
83
+
"model_id": null,
84
+
"stream": false,
85
+
"max_length": 8192,
86
+
"temperature": 0.7,
87
+
"top_p": 0.9,
88
+
"top_k": 80,
89
+
"repetition_penalty": 1.15
90
+
}'
91
+
92
+
# Streaming generation
93
+
curl -X POST "${BASE_URL}/generate" \
94
+
-H "Content-Type: application/json" \
95
+
-d '{
96
+
"prompt": "Explain quantum computing in simple terms",
97
+
"stream": true
98
+
}'
99
+
```
100
+
51
101
**Error Responses:**
52
102
53
103
-`400 Bad Request`: Invalid parameters
@@ -104,6 +154,48 @@ Chat completion endpoint similar to OpenAI's API.
104
154
}
105
155
```
106
156
157
+
**Example (curl):**
158
+
159
+
```bash
160
+
# Basic chat with minimal parameters
161
+
curl -X POST "${BASE_URL}/chat" \
162
+
-H "Content-Type: application/json" \
163
+
-d '{
164
+
"messages": [
165
+
{"role": "system", "content": "You are a helpful assistant."},
166
+
{"role": "user", "content": "Hello, how are you?"}
167
+
]
168
+
}'
169
+
170
+
# Chat with all parameters
171
+
curl -X POST "${BASE_URL}/chat" \
172
+
-H "Content-Type: application/json" \
173
+
-d '{
174
+
"messages": [
175
+
{"role": "system", "content": "You are a helpful assistant."},
176
+
{"role": "user", "content": "Hello, how are you?"}
177
+
],
178
+
"model_id": null,
179
+
"stream": false,
180
+
"max_length": 8192,
181
+
"temperature": 0.7,
182
+
"top_p": 0.9,
183
+
"top_k": 80,
184
+
"repetition_penalty": 1.15
185
+
}'
186
+
187
+
# Streaming chat
188
+
curl -X POST "${BASE_URL}/chat" \
189
+
-H "Content-Type: application/json" \
190
+
-d '{
191
+
"messages": [
192
+
{"role": "system", "content": "You are a helpful assistant."},
193
+
{"role": "user", "content": "Hello, how are you?"}
194
+
],
195
+
"stream": true
196
+
}'
197
+
```
198
+
107
199
### Batch Generation
108
200
109
201
#### POST `/generate/batch`
@@ -139,6 +231,38 @@ Generate text for multiple prompts in parallel.
139
231
}
140
232
```
141
233
234
+
**Example (curl):**
235
+
236
+
```bash
237
+
# Basic batch generation with minimal parameters
238
+
curl -X POST "${BASE_URL}/generate/batch" \
239
+
-H "Content-Type: application/json" \
240
+
-d '{
241
+
"prompts": [
242
+
"Write a haiku about nature",
243
+
"Tell a short joke",
244
+
"Give a fun fact about space"
245
+
]
246
+
}'
247
+
248
+
# Batch generation with all parameters
249
+
curl -X POST "${BASE_URL}/generate/batch" \
250
+
-H "Content-Type: application/json" \
251
+
-d '{
252
+
"prompts": [
253
+
"Write a haiku about nature",
254
+
"Tell a short joke",
255
+
"Give a fun fact about space"
256
+
],
257
+
"model_id": null,
258
+
"max_length": 8192,
259
+
"temperature": 0.7,
260
+
"top_p": 0.9,
261
+
"top_k": 80,
262
+
"repetition_penalty": 1.15
263
+
}'
264
+
```
265
+
142
266
### Model Management
143
267
144
268
#### POST `/models/load`
@@ -153,24 +277,74 @@ Load a specific model.
153
277
}
154
278
```
155
279
280
+
**Example (curl):**
281
+
282
+
```bash
283
+
# Load a specific model
284
+
curl -X POST "${BASE_URL}/models/load" \
285
+
-H "Content-Type: application/json" \
286
+
-d '{
287
+
"model_id": "microsoft/phi-2"
288
+
}'
289
+
```
290
+
156
291
#### GET `/models/current`
157
292
158
293
Get information about the currently loaded model.
159
294
295
+
**Example (curl):**
296
+
297
+
```bash
298
+
# Get current model information
299
+
curl -X GET "${BASE_URL}/models/current"
300
+
```
301
+
160
302
#### GET `/models/available`
161
303
162
304
List all available models in the registry.
163
305
306
+
**Example (curl):**
307
+
308
+
```bash
309
+
# List all available models
310
+
curl -X GET "${BASE_URL}/models/available"
311
+
```
312
+
313
+
#### POST `/models/unload`
314
+
315
+
Unload the current model to free up resources.
316
+
317
+
**Example (curl):**
318
+
319
+
```bash
320
+
# Unload the current model
321
+
curl -X POST "${BASE_URL}/models/unload"
322
+
```
323
+
164
324
### System Information
165
325
166
326
#### GET `/system/info`
167
327
168
328
Get detailed system information.
169
329
330
+
**Example (curl):**
331
+
332
+
```bash
333
+
# Get system information
334
+
curl -X GET "${BASE_URL}/system/info"
335
+
```
336
+
170
337
#### GET `/health`
171
338
172
339
Check the health status of the server.
173
340
341
+
**Example (curl):**
342
+
343
+
```bash
344
+
# Check server health
345
+
curl -X GET "${BASE_URL}/health"
346
+
```
347
+
174
348
## Error Handling
175
349
176
350
All endpoints return appropriate HTTP status codes:
@@ -193,6 +367,47 @@ Error responses include a detail message:
193
367
- 60 requests per minute
194
368
- Burst size of 10 requests
195
369
370
+
## Tips for Using the API
371
+
372
+
### Default Parameters
373
+
374
+
All generation endpoints have sensible defaults for the response quality parameters:
375
+
376
+
-`max_length`: 8192 tokens
377
+
-`temperature`: 0.7
378
+
-`top_p`: 0.9
379
+
-`top_k`: 80
380
+
-`repetition_penalty`: 1.15
381
+
382
+
You can omit any or all of these parameters in your requests, and the server will use these defaults.
383
+
384
+
### Testing with Different Parameters
385
+
386
+
When experimenting with different parameter values, here's what to try:
387
+
388
+
- For more creative responses: Increase `temperature` (0.8-1.0) and `top_p` (0.95-1.0)
389
+
- For more focused responses: Decrease `temperature` (0.3-0.5) and `top_p` (0.5-0.7)
390
+
- For less repetition: Increase `repetition_penalty` (1.2-1.5)
391
+
- For longer responses: Increase `max_length` (up to 16384)
392
+
393
+
### Handling Streaming Responses
394
+
395
+
When using streaming endpoints (`stream: true`), the response will be sent as a series of Server-Sent Events (SSE). Each event starts with `data: ` followed by the token or chunk. The end of the stream is marked with `data: [DONE]`.
396
+
397
+
```bash
398
+
# Example of processing streaming responses with bash
0 commit comments