Skip to content

Fine grained control of GPU offloading #7678

Answered by robcowart
amaxymillian asked this question in Q&A
Discussion options

You must be logged in to vote

You will need to use the --tensor-split parameter...

-ts,   --tensor-split N0,N1,N2,...      fraction of the model to offload to each GPU, comma-separated list of
                                        proportions, e.g. 3,1

You may also need to use a few other parameter as well...

-c,    --ctx-size N                     size of the prompt context (default: 0, 0 = loaded from model)
                                        (env: LLAMA_ARG_CTX_SIZE)

-mg,   --main-gpu INDEX                 the GPU to use for the model (with split-mode = none), or for
                                        intermediate results and KV (with split-mode = row) (default: 0)

Consider the following GPUs...

+-----…

Replies: 2 comments 2 replies

Comment options

You must be logged in to vote
2 replies
@Allan-Luu
Comment options

@robcowart
Comment options

Answer selected by amaxymillian
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants