Skip to content

Commit ca7c404

Browse files
authored
doc: refine structure for document (#123)
* doc: refine structure for document * doc: clean up * doc: clean up
1 parent 2aee819 commit ca7c404

22 files changed

+2701
-166
lines changed

README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,13 @@ Easy Model Deployer is a lightweight tool designed for simplify deploy **Open-So
3434
- Different instance types (CPU/GPU/AWS Inferentia)
3535
- Convenient integration (OpenAI Compatible API, LangChain client, etc.)
3636

37+
## Support Models
38+
<details>
39+
<summary>Deepseek Reasoning Model</summary>
40+
<a href="https://github.com/deepseek-ai/DeepSeek-R1"><strong>DeepSeek-R1-Distill-Qwen-14B</strong></a>
41+
42+
</details>
43+
3744
## 🔧 Get Started
3845

3946
### Installation

docs/en/api.md

Lines changed: 377 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,377 @@
1+
# API Documentation
2+
3+
> **Getting Started**: To obtain the base URL and API key for your deployed models, run `emd status` in your terminal. The command will display a table with your deployed models and their details, including a link to retrieve the API key from AWS Secrets Manager. The base URL is shown at the bottom of the output.
4+
>
5+
> Example output:
6+
> ```
7+
> Models
8+
> ┌────────────────────────┬───────────────────────────────────────────────────────────────────────┐
9+
> │ Model ID │ Qwen2.5-0.5B-Instruct/dev │
10+
> │ Status │ CREATE_COMPLETE │
11+
> │ Service Type │ Amazon SageMaker AI Real-time inference with OpenAI Compatible API │
12+
> │ Instance Type │ ml.g5.2xlarge │
13+
> │ Create Time │ 2025-05-08 12:27:05 UTC │
14+
> │ Query Model API Key │ https://console.aws.amazon.com/secretsmanager/secret?name=EMD-APIKey- │
15+
> │ │ Secrets&region=us-east-1 │
16+
> │ SageMakerEndpointName │ EMD-Model-qwen2-5-0-5b-instruct-endpoint │
17+
> └────────────────────────┴───────────────────────────────────────────────────────────────────────┘
18+
>
19+
> Base URL
20+
> http://your-emd-endpoint.region.elb.amazonaws.com/v1
21+
> ```
22+
23+
## List Models
24+
25+
Returns a list of available models.
26+
27+
**Endpoint:** `GET /v1/models`
28+
29+
**Curl Example:**
30+
```bash
31+
curl https://BASE_URL/v1/models
32+
```
33+
34+
**Python Example:**
35+
```python
36+
from openai import OpenAI
37+
38+
client = OpenAI(
39+
# No API key needed for listing models
40+
base_url="https://BASE_URL"
41+
)
42+
43+
# List available models
44+
models = client.models.list()
45+
for model in models.data:
46+
print(model.id)
47+
```
48+
49+
## Chat Completions
50+
51+
Create a model response for a conversation.
52+
53+
**Endpoint:** `POST /v1/chat/completions`
54+
55+
**Parameters:**
56+
57+
- `model` (required): ID of the model to use (e.g., "Qwen2.5-7B-Instruct/dev", "Llama-3.3-70B-Instruct/dev")
58+
- `messages` (required): Array of message objects with `role` and `content`
59+
- `temperature`: Sampling temperature (0-2, default: 1)
60+
- `top_p`: Nucleus sampling parameter (0-1, default: 1)
61+
- `n`: Number of chat completion choices to generate (default: 1)
62+
- `stream`: Whether to stream partial progress (default: false)
63+
- `stop`: Sequences where the API will stop generating
64+
- `max_tokens`: Maximum number of tokens to generate
65+
- `presence_penalty`: Penalty for new tokens based on presence (-2.0 to 2.0)
66+
- `frequency_penalty`: Penalty for new tokens based on frequency (-2.0 to 2.0)
67+
- `function_call`: Controls how the model responds to function calls
68+
- `functions`: List of functions the model may generate JSON inputs for
69+
70+
**Curl Example:**
71+
```bash
72+
curl https://BASE_URL/v1/chat/completions \
73+
-H "Authorization: Bearer YOUR_API_KEY" \
74+
-H "Content-Type: application/json" \
75+
-d '{
76+
"model": "Qwen2.5-7B-Instruct/dev",
77+
"messages": [
78+
{"role": "system", "content": "You are a helpful assistant."},
79+
{"role": "user", "content": "Hello!"}
80+
],
81+
"temperature": 0.7
82+
}'
83+
```
84+
85+
**Python Example:**
86+
```python
87+
from openai import OpenAI
88+
89+
client = OpenAI(
90+
api_key="YOUR_API_KEY",
91+
base_url="https://BASE_URL"
92+
)
93+
94+
# Create a chat completion
95+
response = client.chat.completions.create(
96+
model="Qwen2.5-7B-Instruct/dev", # Model ID with tag
97+
messages=[
98+
{"role": "system", "content": "You are a helpful assistant."},
99+
{"role": "user", "content": "Hello!"}
100+
],
101+
temperature=0.7,
102+
stream=False
103+
)
104+
105+
# Print the response
106+
print(response.choices[0].message.content)
107+
```
108+
109+
**Streaming Example:**
110+
```python
111+
from openai import OpenAI
112+
113+
client = OpenAI(
114+
api_key="YOUR_API_KEY",
115+
base_url="https://BASE_URL"
116+
)
117+
118+
# Create a streaming chat completion
119+
stream = client.chat.completions.create(
120+
model="Llama-3.3-70B-Instruct/dev", # Model ID with tag
121+
messages=[
122+
{"role": "system", "content": "You are a helpful assistant."},
123+
{"role": "user", "content": "Write a short poem about AI."}
124+
],
125+
stream=True
126+
)
127+
128+
# Process the stream
129+
for chunk in stream:
130+
if chunk.choices[0].delta.content is not None:
131+
print(chunk.choices[0].delta.content, end="")
132+
print()
133+
```
134+
135+
## Embeddings
136+
137+
Get vector representations of text.
138+
139+
**Endpoint:** `POST /v1/embeddings`
140+
141+
**Parameters:**
142+
143+
- `model` (required): ID of the model to use (e.g., "bge-m3/dev")
144+
- `input` (required): Input text to embed or array of texts
145+
- `user`: A unique identifier for the end-user
146+
147+
**Curl Example:**
148+
```bash
149+
curl https://BASE_URL/v1/embeddings \
150+
-H "Authorization: Bearer YOUR_API_KEY" \
151+
-H "Content-Type: application/json" \
152+
-d '{
153+
"model": "bge-m3/dev",
154+
"input": "The food was delicious and the waiter..."
155+
}'
156+
```
157+
158+
**Python Example:**
159+
```python
160+
from openai import OpenAI
161+
162+
client = OpenAI(
163+
api_key="YOUR_API_KEY",
164+
base_url="https://BASE_URL"
165+
)
166+
167+
# Get embeddings for a single text
168+
response = client.embeddings.create(
169+
model="bge-m3/dev", # Embedding model ID with tag
170+
input="The food was delicious and the service was excellent."
171+
)
172+
173+
# Print the embedding vector
174+
print(response.data[0].embedding)
175+
176+
# Get embeddings for multiple texts
177+
response = client.embeddings.create(
178+
model="bge-m3/dev", # Embedding model ID with tag
179+
input=[
180+
"The food was delicious and the service was excellent.",
181+
"The restaurant was very expensive and the food was mediocre."
182+
]
183+
)
184+
185+
# Print the number of embeddings
186+
print(f"Generated {len(response.data)} embeddings")
187+
```
188+
189+
## Rerank
190+
191+
Rerank a list of documents based on their relevance to a query.
192+
193+
**Endpoint:** `POST /v1/rerank`
194+
195+
**Parameters:**
196+
197+
- `model` (required): ID of the model to use (e.g., "bge-reranker-v2-m3/dev")
198+
- `query` (required): The search query
199+
- `documents` (required): List of documents to rerank
200+
- `max_rerank`: Maximum number of documents to rerank (default: all)
201+
- `return_metadata`: Whether to return metadata (default: false)
202+
203+
**Curl Example:**
204+
```bash
205+
curl https://BASE_URL/v1/rerank \
206+
-H "Authorization: Bearer YOUR_API_KEY" \
207+
-H "Content-Type: application/json" \
208+
-d '{
209+
"model": "bge-reranker-v2-m3/dev",
210+
"query": "What is the capital of France?",
211+
"documents": [
212+
"Paris is the capital of France.",
213+
"Berlin is the capital of Germany.",
214+
"London is the capital of England."
215+
]
216+
}'
217+
```
218+
219+
**Python Example:**
220+
```python
221+
from openai import OpenAI
222+
223+
client = OpenAI(
224+
api_key="YOUR_API_KEY",
225+
base_url="https://BASE_URL"
226+
)
227+
228+
# Rerank documents based on a query
229+
response = client.reranking.create(
230+
model="bge-reranker-v2-m3/dev", # Reranking model ID with tag
231+
query="What is the capital of France?",
232+
documents=[
233+
"Paris is the capital of France.",
234+
"Berlin is the capital of Germany.",
235+
"London is the capital of England."
236+
],
237+
max_rerank=3
238+
)
239+
240+
# Print the reranked documents
241+
for result in response.data:
242+
print(f"Document: {result.document}")
243+
print(f"Relevance Score: {result.relevance_score}")
244+
print("---")
245+
```
246+
247+
## Invocations
248+
249+
General-purpose endpoint for model invocations.
250+
251+
**Endpoint:** `POST /v1/invocations`
252+
253+
**Parameters:**
254+
255+
- `model` (required): ID of the model to use
256+
- `input`: Input data for the model
257+
- `parameters`: Additional parameters for the model
258+
259+
**Curl Example:**
260+
```bash
261+
curl https://BASE_URL/v1/invocations \
262+
-H "Authorization: Bearer YOUR_API_KEY" \
263+
-H "Content-Type: application/json" \
264+
-d '{
265+
"model": "Qwen2.5-7B-Instruct/dev",
266+
"input": {
267+
"query": "What is machine learning?"
268+
},
269+
"parameters": {
270+
"max_tokens": 100
271+
}
272+
}'
273+
```
274+
275+
**Python Example:**
276+
```python
277+
import requests
278+
import json
279+
280+
# Set up the API endpoint and headers
281+
url = "https://BASE_URL/v1/invocations"
282+
headers = {
283+
"Authorization": "Bearer YOUR_API_KEY",
284+
"Content-Type": "application/json"
285+
}
286+
287+
# Prepare the payload
288+
payload = {
289+
"model": "Qwen2.5-7B-Instruct/dev", # Model ID with tag
290+
"input": {
291+
"query": "What is machine learning?"
292+
},
293+
"parameters": {
294+
"max_tokens": 100
295+
}
296+
}
297+
298+
# Make the API call
299+
response = requests.post(url, headers=headers, data=json.dumps(payload))
300+
301+
# Print the response
302+
print(response.json())
303+
```
304+
305+
## Vision Models
306+
307+
Process images along with text prompts.
308+
309+
**Endpoint:** `POST /v1/chat/completions`
310+
311+
**Parameters:**
312+
Same as Chat Completions, but with messages that include image content.
313+
314+
**Python Example:**
315+
```python
316+
from openai import OpenAI
317+
import base64
318+
319+
# Function to encode the image
320+
def encode_image(image_path):
321+
with open(image_path, "rb") as image_file:
322+
return base64.b64encode(image_file.read()).decode('utf-8')
323+
324+
# Path to your image
325+
image_path = "path/to/your/image.jpg"
326+
base64_image = encode_image(image_path)
327+
328+
client = OpenAI(
329+
api_key="YOUR_API_KEY",
330+
base_url="https://BASE_URL"
331+
)
332+
333+
response = client.chat.completions.create(
334+
model="Qwen2-VL-7B-Instruct/dev", # Vision model ID with tag
335+
messages=[
336+
{
337+
"role": "user",
338+
"content": [
339+
{"type": "text", "text": "What's in this image?"},
340+
{
341+
"type": "image_url",
342+
"image_url": {
343+
"url": f"data:image/jpeg;base64,{base64_image}"
344+
}
345+
}
346+
]
347+
}
348+
]
349+
)
350+
351+
print(response.choices[0].message.content)
352+
```
353+
354+
## Audio Transcription
355+
356+
Transcribe audio files to text.
357+
358+
**Endpoint:** `POST /v1/audio/transcriptions`
359+
360+
**Python Example:**
361+
```python
362+
from openai import OpenAI
363+
364+
client = OpenAI(
365+
api_key="YOUR_API_KEY",
366+
base_url="https://BASE_URL"
367+
)
368+
369+
audio_file_path = "path/to/audio.mp3"
370+
with open(audio_file_path, "rb") as audio_file:
371+
response = client.audio.transcriptions.create(
372+
model="whisper-large-v3/dev", # ASR model ID with tag
373+
file=audio_file
374+
)
375+
376+
print(response.text) # Transcribed text
377+
```

0 commit comments

Comments
 (0)