Skip to content

Conversation

li126com
Copy link
Collaborator

@li126com li126com commented Feb 21, 2025

兼容版通用检查点

现已支持如下并行转换,局部限制:tp和wp暂不支持互转,且tp和wp必须为整数倍转换。

  • zero1
  • tensor parallel
  • pipeline parallel
  • isp + wp

使用方式

本方案为通过运行时任务生成meta,利用离线脚本转换ckpt的方案,转换同时需要用到新旧配置的metadata.pt文件。可通过设置config中generate_meta_data来在运行任务时独立保存metadata.pt。对于此PR后续保存的ckpt将自动同时生成metadata.pt文件。

精度验证

验证使用dump的数据集来确保数据一致,未启用determinsitic会有略微误差。baseline为0-200 step训练,其他实验组从100 step使用转换的ckpt进行续训。
mtp:
image

isp:
image

@li126com li126com changed the title Feat(ckpt): support compatible version universal checkpoint feat(ckpt): support compatible version universal checkpoint Feb 21, 2025
@sunpengsdu sunpengsdu merged commit f925cd3 into InternLM:develop Mar 4, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants