Skip to content

Inquiry on Support for Qwen2.5 Models and Large Model Training Capabilities #43

@ArcherShirou

Description

@ArcherShirou

I would like to inquire if there are plans to support the Qwen2.5 && Qwen2 series or other popular models from the open-source community, such as Yi. Will the framework support the merging of large models, like the 72B version, similar to MergeKit? Given that running a 72B model requires a significant amount of memory, will the training phase accommodate quantization and LoRA to enable it to run on a single machine with 8 A800 GPUs? Additionally, will DeepSpeed be supported for distributed training? If it could support the merging and training of common model sizes such as 72B, 70B, 34B, 14B, and 7B, it would greatly enhance the applicability of the methods.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions