Multi GPU usage with Chatollama #30019

mr-mainak · 2025-02-27T13:24:49Z

mr-mainak
Feb 27, 2025

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

# ollama approach
    start_time = time.time()
    if llm_model is None:
        llm_model = ChatOllama(model=model_name, num_ctx = 2048,
                        num_predict=500, temperature=0.7, repeat_penalty=1.5,
                        num_thread=22, num_gpu=10, keep_alive=0)
    print("Model is Loaded !! ")
    # response_text = model.invoke(prompt)
    # Print in chunks
    response_content = ""
    for chunk in llm_model.stream(prompt):
        if isinstance(chunk, AIMessageChunk):
            response_content += chunk.content
            # print(chunk.content, end="", flush=True)  # Print as it streams
            yield chunk.content
    end_time = time.time()



### Description

I have multiple GPU's each with same amount of memory. How to distribute the ```gguf``` model in all the GPUs. I'm using the following code:

ollama approach

start_time = time.time()
if llm_model is None:
    llm_model = ChatOllama(model=model_name, num_ctx = 2048,
                    num_predict=500, temperature=0.7, repeat_penalty=1.5,
                    num_thread=22, num_gpu=10, keep_alive=0)
print("Model is Loaded !! ")
# response_text = model.invoke(prompt)
# Print in chunks
response_content = ""
for chunk in llm_model.stream(prompt):
    if isinstance(chunk, AIMessageChunk):
        response_content += chunk.content
        # print(chunk.content, end="", flush=True)  # Print as it streams
        yield chunk.content
end_time = time.time()

``num_gpu``` here is the offloading parameter. Any help is highly appreciated. 

### System Info

langchain==0.3.19
langchain-chroma==0.2.1
langchain-community==0.3.18
langchain-core==0.3.39
langchain-experimental==0.3.4
langchain-huggingface==0.1.2
langchain-ollama==0.2.3
langchain-openai==0.3.4
langchain-text-splitters==0.3.6

python = 3.10
os = ubuntu 22.04
CUDA = 12.5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi GPU usage with Chatollama #30019

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Multi GPU usage with Chatollama #30019

Uh oh!

Uh oh!

mr-mainak Feb 27, 2025

Checked other resources

Commit to Help

Example Code

ollama approach

Replies: 0 comments

mr-mainak
Feb 27, 2025