Add new models #312

sallyjunjun · 2024-09-03T10:37:25Z

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Support more models in InternEvo.

Modification

add qwen2
the attention implementation of qwen2 is sliding-window-attention, we add SWA class for this model. This is a new feature of flash-attn, with version after v2.3.0.
add baichuan2
the attention implementation of baichuan2 is MHA, with enable_qkv_fusion is True.
add gemma
support gelu, support add_unit_offset in layer_norm computation, support user defined head_dim in GQA
add load and save ckpt in qwen2 model
add load and save ckpt in baichuan2 model
add load and save ckpt in gemma model

accuracy test result is as follows:
qwen2:
isp vs dp:

isp load internevo saved ckpt:

isp load hf ckpt:

baichuan2:
isp vs dp:

isp load internevo saved ckpt:

isp load hf ckpt:

gemma:
isp vs dp:

isp load internevo saved ckpt:

isp load hf ckpt:

BC-breaking (Optional)

Does the modification introduce changes that break the backward compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here and update the documentation.

Checklist

Before PR:

Pre-commit or other linting tools are used to fix the potential lint issues.
Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects.
CLA has been signed and all committers have signed the CLA in this PR.

fix lint

sunpengsdu · 2024-09-10T06:04:25Z

internlm/data/build_dataloader.py

            image_token_size = int(data_cfg.image_size // data_cfg.patch_size) ** 2
            train_ds = RandomDatasetMultimodal(
-                num_samples=100000,
+                num_samples=gpc.get_world_size(ParallelMode.DATA) * 500,


@huangting4201 看看这么改合理吗

mm-assistant bot assigned yhcc Sep 3, 2024

sallyjunjun force-pushed the add-models branch 13 times, most recently from 9ef0922 to 1329939 Compare September 10, 2024 02:37

sallyjunjun added 6 commits September 10, 2024 10:48

add qwen2 model

7f11ad4

add baichuan2 model

49cbb44

fix bias and data sample of qwen2

c997e7d

fix lint

add gemma

7829e46

add load and save huggingface ckpt in baichuan2, qwen2 and gemma

be6cb56

fix models and add doc

431530b

sallyjunjun force-pushed the add-models branch from 1329939 to 431530b Compare September 10, 2024 02:48

sunpengsdu reviewed Sep 10, 2024

View reviewed changes

sunpengsdu approved these changes Sep 11, 2024

View reviewed changes

sunpengsdu merged commit d9bb33f into develop Sep 11, 2024
20 checks passed

sallyjunjun deleted the add-models branch September 20, 2024 03:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add new models #312

Add new models #312

Uh oh!

sallyjunjun commented Sep 3, 2024 •

edited

Loading

Uh oh!

sunpengsdu Sep 10, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add new models #312

Add new models #312

Uh oh!

Conversation

sallyjunjun commented Sep 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modification

BC-breaking (Optional)

Use cases (Optional)

Checklist

Uh oh!

sunpengsdu Sep 10, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sallyjunjun commented Sep 3, 2024 •

edited

Loading