想问个问题,falcon的注意力权重是fused_qkv的形式,用col_nn.Linear1D_Col这个函数进行切分是否正确???假设我的tp_size=2,会不会造成切分错误,为啥不用FusedLinear1D_Col #5222
Unanswered
laiqinghan
asked this question in
Community | Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
想问个问题,falcon的注意力权重是fused_qkv的形式,用col_nn.Linear1D_Col这个函数进行切分是否正确???假设我的tp_size=2,会不会造成切分错误,为啥不用FusedLinear1D_Col.这种切分方式在运行上应该不会报错,但是如果是微调huggingface的模型会不会造成计算错乱,直接对[QKV]切分,假设tp_size=2,会不会导致K被切分了,但Q,V并没有被切分,要是这样的话,下面这行拆分q,k,v的代码也可以跑通,但是计算逻辑上是否有问题呢???:祈祷:



Beta Was this translation helpful? Give feedback.
All reactions