feat(ckpt): support compatible version universal checkpoint #418
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
兼容版通用检查点
现已支持如下并行转换,局部限制:tp和wp暂不支持互转,且tp和wp必须为整数倍转换。
使用方式
本方案为通过运行时任务生成meta,利用离线脚本转换ckpt的方案,转换同时需要用到新旧配置的metadata.pt文件。可通过设置config中generate_meta_data来在运行任务时独立保存metadata.pt。对于此PR后续保存的ckpt将自动同时生成metadata.pt文件。
精度验证
验证使用dump的数据集来确保数据一致,未启用determinsitic会有略微误差。baseline为0-200 step训练,其他实验组从100 step使用转换的ckpt进行续训。

mtp:
isp:
