Why use manual tensor parallelism implementation instead of using something like deepspeed? #267

vikigenius · 2023-06-27T00:10:39Z

vikigenius
Jun 27, 2023

I know it could have just been a design decision, but I would love to hear your rationale behind why you rolled your own implementation of model parallelism over using something like deepspeed as a dependency.

zhuohan123 · 2023-06-30T02:22:54Z

zhuohan123
Jun 30, 2023
Maintainer

Zero and FSDP communicate weights instead of activations. In inference, activations are small but weights are large. Therefore we choose tensor parallelism for efficiency.

1 reply

qingquansong Jul 21, 2024

For offline inference with prefill only case（such as embedding generation），this tp setting seems to be worse than deepspeed and fsdp.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Why use manual tensor parallelism implementation instead of using something like deepspeed? #267

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Why use manual tensor parallelism implementation instead of using something like deepspeed? #267

Uh oh!

vikigenius Jun 27, 2023

Replies: 1 comment · 1 reply

Uh oh!

zhuohan123 Jun 30, 2023 Maintainer

Uh oh!

qingquansong Jul 21, 2024

vikigenius
Jun 27, 2023

Replies: 1 comment 1 reply

zhuohan123
Jun 30, 2023
Maintainer