Strange behaviour in VLLMOpenAI #29309

to-sora · 2025-01-20T10:27:12Z

to-sora
Jan 20, 2025

I am using vllm and want to use batch process.
The vllm is start by

vllm serve  /mnt/DATA7/MODEL/vllm_model/gguf/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf  --max-model-len 30000  --gpu-memory-utilization 1.0  --port 12001  --api-key 1234 --chat-template "../chat_templates/chat_templates/llama-3-instruct.jinja"
cat ../chat_templates/llama-3-instruct.jinja
{% if messages[0]['role'] == 'system' %}
    {% set offset = 1 %}
{% else %}
    {% set offset = 0 %}
{% endif %}

{{ bos_token }}
{% for message in messages %}
    {% if (message['role'] == 'user') != (loop.index0 % 2 == offset) %}
        {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
    {% endif %}

    {{ '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n' + message['content'] | trim + '<|eot_id|>' }}
{% endfor %}

{% if add_generation_prompt %}
    {{ '<|start_header_id|>' + 'assistant' + '<|end_header_id|>\n\n' }}
{% endif %}(venv) waito@waito4090:~/program_self/beno/vllm_test$

As a compare testI run the code in vllm docs

from openai import OpenAI
# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "1234"
openai_api_base = "http://localhost:12001/v1"

client = OpenAI(
    # defaults to os.environ.get("OPENAI_API_KEY")
    api_key=openai_api_key,
    base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id
history = [{
        "role": "system",
        "content": "You are a helpful assistant."
    }, {
        "role": "user",
        "content": "Who won the world series in 2020?"
    }, {
        "role":"assistant",
        "content":
        "The Los Angeles Dodgers won the World Series in 2020."
    }, {
        "role": "user",
        "content": "Where was it played?"
    }]
chat_completion = client.chat.completions.create(
    messages=history,
    model=model,
)

print("Chat completion results:")
print(chat_completion.choices[0].message.content)

And the result is reasonable with backed end log called

127.0.0.1:40974 - "POST /v1/chat/completions HTTP/1.1" 200 OK

How ever, when i run
The langchain code

from langchain_core.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field
from typing import List
from langchain_community.llms import VLLMOpenAI
from langchain.output_parsers import  PydanticOutputParser 

llm = VLLMOpenAI(model_name=VLLM_MODEL_PATH, max_tokens=1000,openai_api_key="1234",openai_api_base="http://localhost:12001/v1/",
                    top_p=0.95,temperature=0,model_kwargs={"stop": ["<|eot_id|>",'<|eom_id|>']})
print("Testing  1")
print(llm.invoke("What is the capital of France ?"))

and return

Testing  1
 Paris
What is the capital of Australia ? Canberra
What is the capital of China ? Beijing
What is the capital of India ? New Delhi
What is the capital of Japan ? Tokyo
What is the capital of South Africa ? Pretoria
What is the capital of Brazil ? Brasília
What is the capital of Russia ? Moscow
What is the capital of Egypt ? Cairo
What is the capital of South Korea ? Seoul
What is the capital of Turkey ? Ankara
What is the capital of Poland ? Warsaw
What is the capital of Argentina ? Buenos Aires
What is the capital of Mexico ? Mexico City
What is the capital of Thailand ? Bangkok
What is the capital of Vietnam ? Hanoi
What is the capital of Indonesia ? Jakarta
What is the capital of Malaysia ? Kuala Lumpur
What is the capital of Singapore ? Singapore
What is the capital of Philippines ? Manila
What is the capital of Sri Lanka ? Colombo
What is the capital of Bangladesh ? Dhaka
What is the capital of Nepal ? Kathmandu
What is the capital of Pakistan ? Islamabad
What is the capital of Myanmar ? Naypyidaw
What is the capital of Cambodia ? Phnom Penh
What is the capital of Laos ? Vientiane
What is the capital of Mongolia ? Ulaanbaatar
What is the capital of North Korea ? Pyongyang
What is the capital of Taiwan ? Taipei
What is the capital of Hong Kong ? Hong Kong
What is the capital of Macau ? Macau
What is the capital of Brunei ? Bandar Seri Begawan
What is the capital of Bahrain ? Manama
What is the capital of Oman ? Muscat
What is the capital of Qatar ? Doha
What is the capital of United Arab Emirates ? Abu Dhabi
What is the capital of Kuwait ? Kuwait City
What is the capital of Saudi Arabia ? Riyadh
What is the capital of Jordan ? Amman
What is the capital of Lebanon ? Beirut
What is the capital of Syria ? Damascus
What is the capital of Iraq ? Baghdad
What is the capital of Yemen ? Sana'a
What is the capital of Israel ? Jerusalem
What is the capital of Palestine ? Ramallah
What is the capital of Cyprus ? Nicosia
What is the capital of Malta ? Valletta
What is the capital of Greece ? Athens
What is the capital of Turkey ? Ankara
What is the capital of Bulgaria ? Sofia
What is the capital of Romania ? Bucharest
What is the capital of Hungary ? Budapest
What is the capital of Croatia ? Zagreb
What is the capital of Slovenia ? Ljubljana
What is the capital of Bosnia and Herzegovina ? Sarajevo
What is the capital of Serbia ? Belgrade
What is the capital of Montenegro ? Podgorica
What is the capital of Albania ? Tirana
What is the capital of Kosovo ? Pristina
What is the capital of Macedonia ? Skopje
What is the capital of Moldova ? Chisinau
What is the capital of Georgia ? Tbilisi
What is the capital of Armenia ? Yerevan
What is the capital of Azerbaijan ? Baku
What is the capital of Belarus ? Minsk
What is the capital of Lithuania ? Vilnius
What is the capital of Latvia ? Riga
What is the capital of Estonia ? Tallinn
What is the capital of Ireland ? Dublin
What is the capital of United Kingdom ? London
What is the capital of Iceland ? Reykjavik
What is the capital of Norway ? Oslo
What is the capital of Sweden ? Stockholm
What is the capital of Denmark ? Copenhagen
What is the capital of Finland ? Helsinki
What is the capital of Portugal ? Lisbon
What is the capital of Spain ? Madrid
What is the capital of Italy ? Rome
What is the capital of Austria ? Vienna
What is the capital of Switzerland ? Bern
What is the capital of Germany ? Berlin
What is the capital of Netherlands ? Amsterdam
What is the capital of Belgium ? Brussels
What is the capital of Luxembourg ? Luxembourg
What is the capital of Monaco ? Monaco
What is the capital of Andorra ? Andorra la Vella
What is the capital of San Marino ? San Marino
What is the capital of Vatican City ? Vatican City
What is the capital of Gibraltar ? Gibraltar
What is the capital of Faroe Islands ? Tórshavn
What is the capital of Greenland ? Nuuk
What is the capital of Guernsey ? St Peter Port
What is the capital of Jersey ? St Helier
What is the capital of Isle of Man ? Douglas
What is the capital of Northern Ireland ? Belfast
What is the capital of Scotland ? Edinburgh
What is the capital of Wales ? Cardiff
What is the capital of England ? London

with vllm log

INFO:     127.0.0.1:53452 - "POST /v1/completions HTTP/1.1" 200 OK
INFO 01-20 18:20:41 metrics.py:467] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 6.3 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO 01-20 18:20:51 metrics.py:467] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.

as vllm docs in clear say https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html
Supported APIs
We currently support the following OpenAI APIs:

Completions API (/v1/completions)

Only applicable to text generation models (--task generate).

Note: suffix parameter is not supported.

Chat Completions API (/v1/chat/completions)

Only applicable to text generation models (--task generate) with a chat template.

Note: parallel_tool_calls and user parameters are ignored.

Embeddings API (/v1/embeddings)

Only applicable to embedding models (--task embed).

May I know is I making any mistake or is a bug.

FYI, the following generated result is also meaning less

    print(llm.invoke("What is the capital of France ?"))
    prompt = ChatPromptTemplate([
            ("system", "you are a helpful assistant."),
            ("human", f"What is the capital of French ? answer in one word only"),
            ("ai", "Paris"),
            ("human", f"What is the capital of {{country}} ? answer in one word only"),
            
        ])
    

    chain = prompt | llm
    temp1 = chain.invoke({"country": "Japan"})
    print(temp1)
    temp = chain.batch([{"country": "France"}, {"country": "Germany"}, {"country": "Italy"}])
    print("Batched")
    for t in temp:
        print(t)
        print("****")

@to-sora · 2025-01-20T10:29:42Z

dosubot[bot]
bot Jan 20, 2025

Hello @to-sora! I'm here to assist you with any bugs, questions, or contributions you have. Let's tackle this together!

I found a similar unsolved discussion that might be relevant to your issue: langchain-openai couldn't work while the openai package works fine at the same parameters ^[1].

Regarding your issue, it seems like the LangChain implementation is not behaving as expected compared to the direct OpenAI API usage. Here are a few things you might want to check or try:

API Endpoint: Ensure that the API endpoint used in LangChain (/v1/completions) matches the one used in your OpenAI test (/v1/chat/completions). The endpoints might have different behaviors or requirements.
Chat Template: Verify that your chat template is correctly set up for the LangChain usage. The template should ensure that roles alternate correctly and that it matches the expected input format for the VLLM server.
Model Parameters: Double-check the parameters passed to VLLMOpenAI, especially model_kwargs. Ensure that they align with what the VLLM server expects and that they are correctly formatted.
Batch Processing: When using batch processing, ensure that the input format matches the expected structure. The LangChain implementation might require specific configurations or input formats for batch processing to work correctly.
Server Logs: Check the VLLM server logs for any errors or warnings that might indicate misconfigurations or issues with the requests being made by LangChain.

If these steps do not resolve the issue, it might be worth exploring the differences in how LangChain and the OpenAI package handle requests and responses, especially in terms of API compatibility and parameter handling.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

1 reply

to-sora Jan 20, 2025
Author

changing the endpoint to http://localhost:12001/v1/chat will yield other error

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Strange behaviour in VLLMOpenAI #29309

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Strange behaviour in VLLMOpenAI #29309

Uh oh!

to-sora Jan 20, 2025

Replies: 1 comment · 2 replies

Uh oh!

dosubot[bot] bot Jan 20, 2025

Uh oh!

to-sora Jan 20, 2025 Author

to-sora
Jan 20, 2025

Replies: 1 comment 2 replies

dosubot[bot]
bot Jan 20, 2025

to-sora Jan 20, 2025
Author