Looking for sample code to integrate streaming output for VLLM through Langchain #28405

GaneshDoosa · 2024-11-28T15:03:13Z

GaneshDoosa
Nov 28, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

from langchain_community.llms.vllm import VLLM
llm = VLLM(
                model="llama3.1-8B-instruct",
                trust_remote_code=True,  # mandatory for hf models
                max_new_tokens=2048,
                top_k=10,
                top_p=0.95,
                temperature=0,
                vllm_kwargs={
                    # "max_model_len": 1024,
                    "gpu_memory_utilization": 0.95,
                },
            )

Description

Even after adding stream=True the output tokens not being parsed as stream

System Info

langchain : 0.2.15
vllm : 0.6.1

feijoes · 2024-11-28T16:15:47Z

feijoes
Nov 28, 2024

@GaneshDoosa VLLM is a BaseLLM try:

llm = VLLM(...)
async for chunk in llm.astream("Question"):
    print(chunk, end="", flush=True)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Looking for sample code to integrate streaming output for VLLM through Langchain #28405

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Looking for sample code to integrate streaming output for VLLM through Langchain #28405

Uh oh!

Uh oh!

GaneshDoosa Nov 28, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 1 comment

Uh oh!

Uh oh!

feijoes Nov 28, 2024

GaneshDoosa
Nov 28, 2024

feijoes
Nov 28, 2024