Handling Tool Calls with Custom LLMs and Non-Streaming Responses #16249

sowjanyagunupuru · 2025-09-04T05:06:22Z

sowjanyagunupuru
Sep 4, 2025

I have developed a custom LLM following the LLaMA architecture, which is intended to interact with a newly created agent. However, I’ve encountered an issue: tool calls are only functioning correctly when using OpenAI models such as GPT-4o. I need suggestions on how to properly handle tool calls when using custom models—such as those deployed on AWS.

Additionally, I have a question regarding the OpenAI model request and response streaming mechanism. Is the communication handled via WebSockets or Server-Sent Events (SSE)?

Furthermore, when working with a custom model that returns a static response (i.e., without streaming), is it feasible to process that response to programmatically invoke a tool call? For example, can I trigger actions such as "Apply All" or "Discard" on a changeset by parsing the static response and calling the corresponding function (e.g., changeSet_writeChangeToFile)?

Answered by sdirix

Sep 9, 2025

I see that you are using the OpenAI API. As we already have a LanguageModel implementation for OpenAI you should not need to implement your own version, instead you can reuse the existing one.

However the current one has a shortcoming which you might already have ran into: The non-streaming request implementation does not support tool calls yet. So if you need this support, then at the moment you need to implement it yourself. I would like to suggest copying the current OpenAI LanguageModel and modifying the non-streaming request code.

As @eneufeld indicated, the best place to see how to do this is the Ollama LanguageModel.

If you manage to do so, it would be great if you could contribute…

View full answer

sowjanyagunupuru · 2025-09-09T08:37:50Z

sowjanyagunupuru
Sep 9, 2025
Author

@sdirix Hii, I am currently working on invoking the changeSet_writeChangeToFile function from within my language model class. Below is the returned structure:
return { value: { content: undefined, "tool_calls": [ { "function": { "arguments": "{\"content\":\"{\\n \\\"name\\\": \\\"random-data\\\",\\n \\\"version\\\": \\\"1.0.0\\\",\\n \\\"items\\\": [\\n {\\n \\\"id\\\": 1,\\n \\\"name\\\": \\\"John Doe\\\",\\n \\\"email\\\": \\\"johndoe@example.com\\\"\\n },\\n {\\n \\\"id\\\": 2,\\n \\\"name\\\": \\\"Jane Smith\\\",\\n \\\"email\\\": \\\"janesmith@example.com\\\"\\n }\\n ],\\n \\\"active\\\": true,\\n \\\"created_at\\\": \\\"2023-02-25T12:34:56.789Z\\\"\\n}\",\"path\":\"wick.json\"}", "name": "changeSet_writeChangeToFile" }, "id": "call_zzqG7It8iD00yam48Qm3uudW", "finished": false } ] }, done: true };

                  In the UI, I can see that the changeSet_writeChangeToFile function is being called as expected (as shown in the screenshot below), but the execution halts without a response:

I would appreciate your assistance in resolving this issue so that the changeSet_writeChangeToFile function completes execution successfully. The goal is to ensure that the file is saved properly and that the associated user actions—such as Apply All, Discard, etc.—are displayed and fully functional.

Please let me know if you need any additional details to help troubleshoot this issue.

Thank you!

1 reply

sdirix Sep 9, 2025
Collaborator

Hi @sowjanyagunupuru, can you post the implementation of the language model? Also please post the prompt you used for Coder.

sowjanyagunupuru · 2025-09-09T09:17:07Z

sowjanyagunupuru
Sep 9, 2025
Author

Please find below the implementation of the language model class, as well as the prompt template used for Coder.
`# Instructions
You are an AI assistant integrated into Theia IDE, designed to assist software developers with code tasks. You can interact with the code base and suggest changes.

Context Retrieval

Use the following functions to interact with the workspace files if you require context:

~{getWorkspaceDirectoryStructure}: Returns the complete directory structure.
~{getWorkspaceFileList}: Lists files and directories in a specific directory.
~{getFileContent}: Retrieves the content of a specific file.
~{context_addFile}: Remember file locations that are relevant for completing your tasks. Only add files that are really relevant to look at later.

File Validation

Use the following function to retrieve a list of problems in a file if the user requests fixes in a given file:

~{getFileDiagnostics}: Retrieves a list of problems identified in a given file by tool integrations such as language servers and linters.

Propose Code Changes

To propose code changes or any file changes to the user, never print code or new file content in your response.

Instead, for each file you want to propose changes for:

Always Retrieve Current Content: Use getFileContent to get the latest content of the target file.
Change Content: Use ~{changeSet_writeChangeToFile} to propose file changes to the user.

Additional Context

The following files have been provided for additional context. Some of them may also be referred to by the user. Always look at the relevant files to understand your task using the function ~{getFileContent}
{{contextFiles}}

Previously Proposed Changes

You have previously proposed changes for the following files. Some suggestions may have been accepted by the user, while others may still be pending.
{{changeSetSummary}}
`

Language Model Class Implementation
'

async request(request: LanguageModelRequest, cancellationToken?: CancellationToken): Promise<LanguageModelResponse> {
    const openai = this.initializeOpenAi();
    let reqMsg: any;
    console.log(this.details);

    request.messages.forEach((item: any) => {
        if (item.actor === 'user' || item.type === 'text') {
            reqMsg = item.text;
        }
    });
    const tools = this.createTools(request);

    const body = {
        "session_id": "a298d879-2c55-40cc-9675-f13e36c24556",
        "payload": {
            "user_query": reqMsg,
        },
        "tools": tools,
        "tool_choice": "auto"
    };

    return this.HandleNonStreamingRequest(openai, request, body, { 'Content-Type': 'application/json' }, cancellationToken);
}

protected async HandleNonStreamingRequest(openai: OpenAI, request: LanguageModelRequest, body: any, headers: any, cancellationToken?: CancellationToken): Promise<any> {
    const settings = this.getSettings(request);
    let response: any = await openai.chat.completions.create({
        model: this.model,
        messages: this.processMessages(request.messages),
        ...settings,
        stream: true,
    }, { body: body, headers: headers, path: '' });

    const encoder = new TextEncoder();
    const encoded = encoder.encode(JSON.stringify(response));

    const stream = new ReadableStream<Uint8Array>({
        start(controller) {
            controller.enqueue(encoded);
            controller.close();
        }
    });

    const reader = stream.getReader();
    const decoder = new TextDecoder();

    return {
        stream: {
            [Symbol.asyncIterator](): AsyncIterator<LanguageModelStreamResponsePart> {
                return {
                    async next(): Promise<IteratorResult<LanguageModelStreamResponsePart>> {
                        if (cancellationToken?.isCancellationRequested) {
                            reader.cancel();
                            return { value: undefined, done: true };
                        }
                        const { value, done } = await reader.read();
                        if (done) {
                            return { value: undefined, done: true };
                        }
                        const read = decoder.decode(value, { stream: true });
                        const chunk = read.split('\n').filter(l => l.length !== 0).reduce((acc, line) => {
                            try {
                                const parsed = JSON.parse(line);
                                acc += parsed.result;
                                return acc;
                            } catch (error) {
                                console.error('Error parsing JSON:', error);
                                return acc;
                            }
                        }, '');

                        if (!response) {
                            return { value: { content: "null" }, done: false };
                        }

                        return {
                            value: {
                                content: undefined,
                                "tool_calls": [
                                    {
                                        "function": {
                                            "arguments": "{\"content\":\"{\\n  \\\"name\\\": \\\"random-data\\\",\\n  \\\"version\\\": \\\"1.0.0\\\",\\n  \\\"items\\\": [\\n    {\\n      \\\"id\\\": 1,\\n      \\\"name\\\": \\\"John Doe\\\",\\n      \\\"email\\\": \\\"johndoe@example.com\\\"\\n    },\\n    {\\n      \\\"id\\\": 2,\\n      \\\"name\\\": \\\"Jane Smith\\\",\\n      \\\"email\\\": \\\"janesmith@example.com\\\"\\n    }\\n  ],\\n  \\\"active\\\": true,\\n  \\\"created_at\\\": \\\"2023-02-25T12:34:56.789Z\\\"\\n}\",\"path\":\"wick.json\"}",
                                            "name": "changeSet_writeChangeToFile"
                                        },
                                        "id": "call_zzqG7It8iD00yam48Qm3uudW",
                                        "finished": false
                                    }
                                ]
                            },
                            done: true
                        };
                    }
                };
            }
        }
    };
}

`
here i followed the llamma model implementation.
Please let me know if you require any further details or clarifications.

3 replies

eneufeld Sep 9, 2025
Collaborator

Hi @sowjanyagunupuru ,
to best see how the streaming communication works look at packages/ai-anthropic/src/node/anthropic-language-model.ts and there the handleStreamingRequest method or packages/ai-ollama/src/node/ollama-language-model.ts and the handleStreamingRequest method. In either case you need to manually evaluate the toolcall and set it to finished.
For llama-language-model.ts this happens in processToolCalls, for anthropic-language-model.ts this is part of the handleStreamingRequest
in both cases you have to yield a similar object to the initial tool call one but with finished:true and the result of the tool call.

sdirix Sep 9, 2025
Collaborator

I see that you are using the OpenAI API. As we already have a LanguageModel implementation for OpenAI you should not need to implement your own version, instead you can reuse the existing one.

However the current one has a shortcoming which you might already have ran into: The non-streaming request implementation does not support tool calls yet. So if you need this support, then at the moment you need to implement it yourself. I would like to suggest copying the current OpenAI LanguageModel and modifying the non-streaming request code.

As @eneufeld indicated, the best place to see how to do this is the Ollama LanguageModel.

If you manage to do so, it would be great if you could contribute the changes back to Theia so others can also use the OpenAI LanguageModels with tools without streaming.

Answer selected by sowjanyagunupuru

sowjanyagunupuru Sep 15, 2025
Author

Hiii @sdirix , @eneufeld ,
The classes you suggested were extremely helpful in guiding us through handling tool calls with a static API. The Ollama Language Model has significantly simplified our task, and we truly appreciate your support.
We look forward to contributing to this project soon.
Thank you once again for your assistance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handling Tool Calls with Custom LLMs and Non-Streaming Responses #16249

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Handling Tool Calls with Custom LLMs and Non-Streaming Responses #16249

Uh oh!

sowjanyagunupuru Sep 4, 2025

Replies: 2 comments · 4 replies

Uh oh!

sowjanyagunupuru Sep 9, 2025 Author

Uh oh!

sdirix Sep 9, 2025 Collaborator

Uh oh!

Uh oh!

sowjanyagunupuru Sep 9, 2025 Author

Context Retrieval

File Validation

Propose Code Changes

Additional Context

Previously Proposed Changes

Uh oh!

eneufeld Sep 9, 2025 Collaborator

Uh oh!

sdirix Sep 9, 2025 Collaborator

Uh oh!

sowjanyagunupuru Sep 15, 2025 Author

sowjanyagunupuru
Sep 4, 2025

Replies: 2 comments 4 replies

sowjanyagunupuru
Sep 9, 2025
Author

sdirix Sep 9, 2025
Collaborator

sowjanyagunupuru
Sep 9, 2025
Author

eneufeld Sep 9, 2025
Collaborator

sdirix Sep 9, 2025
Collaborator

sowjanyagunupuru Sep 15, 2025
Author