Inquiry on Support for Qwen2.5 Models and Large Model Training Capabilities

I would like to inquire if there are plans to support the Qwen2.5 && Qwen2 series or other popular models from the open-source community, such as Yi. Will the framework support the merging of large models, like the 72B version, similar to MergeKit? Given that running a 72B model requires a significant amount of memory, will the training phase accommodate quantization and LoRA to enable it to run on a single machine with 8 A800 GPUs? Additionally, will DeepSpeed be supported for distributed training? If it could support the merging and training of common model sizes such as 72B, 70B, 34B, 14B, and 7B, it would greatly enhance the applicability of the methods.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inquiry on Support for Qwen2.5 Models and Large Model Training Capabilities #43

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inquiry on Support for Qwen2.5 Models and Large Model Training Capabilities #43

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions