feat(dataloader): Implement megatron dataloader and mocked dataloader #323

zigzagcai · 2024-09-10T11:22:50Z

This PR's main functionality is okay and runnable, but still need some refinement.

Motivation

Support megatron dataloader type: When users want to use InternEvo framework to train over megatron tokenized datasets.
Support mocked dataloader type: When users want to conduct precision alignment experiment to ensure that the loaded data is completely consistent.

Modification

internlm/data/megatron/*
internlm/data/mocked/*

BC-breaking (Optional)

None

Use cases (Optional)

None

Checklist

Before PR:

Pre-commit or other linting tools are used to fix the potential lint issues.
Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects.
CLA has been signed and all committers have signed the CLA in this PR.

…oader

sunpengsdu · 2024-09-12T02:20:33Z

LGTM

zigzagcai added 4 commits September 9, 2024 19:01

add megatron dl

5f00264

code refine for megatron dataloader

c41c5d0

Merge remote-tracking branch 'origin/develop' into add-megatron-datal…

9a3ed57

…oader

implement megatron dataloader

01d6240

mm-assistant bot assigned yhcc Sep 10, 2024

zigzagcai added 2 commits September 10, 2024 19:43

fix pylint

2be4e5e

revert config file

384f01c

zigzagcai changed the title ~~feat(dataloader): Implement megatron dataloader~~ feat(dataloader): Implement megatron dataloader and mocked dataloader Sep 11, 2024

add support for mocked dataloader

f70b163

sunpengsdu approved these changes Sep 12, 2024

View reviewed changes

sunpengsdu merged commit 77e6cb7 into InternLM:develop Sep 12, 2024
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(dataloader): Implement megatron dataloader and mocked dataloader #323

feat(dataloader): Implement megatron dataloader and mocked dataloader #323

Uh oh!

zigzagcai commented Sep 10, 2024 •

edited

Loading

Uh oh!

sunpengsdu commented Sep 12, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(dataloader): Implement megatron dataloader and mocked dataloader #323

feat(dataloader): Implement megatron dataloader and mocked dataloader #323

Uh oh!

Conversation

zigzagcai commented Sep 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modification

BC-breaking (Optional)

Use cases (Optional)

Checklist

Uh oh!

sunpengsdu commented Sep 12, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zigzagcai commented Sep 10, 2024 •

edited

Loading