Replies: 1 comment
-
It seems that llama actually does a column parallel approach which is misleading since the option passed is |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I am struggling to find where partial results from matrix multiply are reduced together. My understanding is when I use
-sm row
, that a row-wise tensor parallel approach is employed, where the result of every single matrix multiply needs to be all reduced. However, I only really see awarp_reduce_sum
which I assume is used for the tiling that happens across threads in a warp on a single gpu, but not an operation between GPUs reducing the whole matrix.Beta Was this translation helpful? Give feedback.
All reactions