Tool calling with llm.chat #12557

alexanderbrodko · 2025-01-29T16:18:57Z

alexanderbrodko
Jan 29, 2025

According to this example
https://github.com/vllm-project/vllm/blob/27b78c73cad00f5c7bb3b2431f02dc680f7034bc/examples/offline_inference/chat_with_tools.py

I create some model:

tokenizer = AutoTokenizer.from_pretrained(model_dir)
sampling_params = SamplingParams(temperature=0.85, top_p=0.9, repetition_penalty=1.1, max_tokens=2048)
llm = LLM(model=model_dir, gpu_memory_utilization=0.9, max_model_len=16384)

Do inference:

text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    output = llm.chat([text], sampling_params, tools=tools)[0]
    generated_text = output.outputs[0].text

Then I ask the model about temperature in San Francisco and it fails.

File &quot;/home/xjesu/vllm/lib/python3.12/site-packages/vllm/entrypoints/chat_utils.py&quot;, line 430, in _parse_chat_message_content
    role = message[&quot;role&quot;]
           ~~~~~~~^^^^^^^^
TypeError: string indices must be integers, not "str";

Any suggestions?

alexanderbrodko · 2025-01-29T16:26:51Z

alexanderbrodko
Jan 29, 2025
Author

My bad. I do not need tokenize when I use llm.chat instead of llm.generate. It works, the model answer is

{"name": "get_weather", "arguments": {"location": "San Francisco, CA", "unit": "celsius"}}

In fact, the model is Qwen2.5-Coder-Instruct-0.5B

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Tool calling with llm.chat #12557

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Tool calling with llm.chat #12557

Uh oh!

alexanderbrodko Jan 29, 2025

Replies: 1 comment

Uh oh!

alexanderbrodko Jan 29, 2025 Author

alexanderbrodko
Jan 29, 2025

alexanderbrodko
Jan 29, 2025
Author