Open
Description
Your current environment
Environment Information
- vLLM version: 0.9.2rc2.dev125+g49e8c7ea2.d20250710
- PyTorch version: torch2.7.0+cuda118
- Transformers version: 4.51.1
- Model: Qwen2Audio-7B-instruct
- OS: Linux 4.18.0-425.3.1.el8.x86_64
- GPU: CUDA-enabled
### 🐛 Describe the bug
When using vLLM to serve Qwen2Audio-7B-instruct model, audio input requests result in 500 Internal Server Error, while text-only requests work fine.
## Steps to Reproduce
### 1. Start vLLM Server
```bash
vllm serve /mnt/afs/share/Qwen2-Audio-7B-instruct \
--port 8006 \
--trust-remote-code \
--enforce-eager \
--max-model-len 8192
2. Test Text-Only Request (Works)
curl -X POST "http://localhost:8006/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "/mnt/afs/share/Qwen2-Audio-7B-instruct",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 100
}'
3. Test Audio Request (Fails with 500 Internal Server Error)
with open("test.wav", "rb") as f:
audio = f.read()
audio_base64 = base64.b64encode(audio).decode("utf-8")
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8006/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
chat_completion_from_base64 = client.chat.completions.create(
messages=[
{
"role":"user",
"content": [
{
"type": "text",
"text": "What's in this audio?"
},
{
"type": "audio_url",
"audio_url": {
"url": f"data:audio/ogg;base64,{audio_base64}"
},
},
],
}],
model='/mnt/afs/share/Qwen2-Audio-7B-instruct',
max_completion_tokens=64,
)
result = chat_completion_from_base64.choices[0].message.content
print("Chat completion output from input audio:", result)
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.