feat: support rollout importance sampling modified from verl #1197

RangiLyu · 2025-10-30T11:14:35Z

Adapt VeRL's rollout importance sampling method

Modify from https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/mismatch_helper.py

Main differences:

Adapt to packed sequences in Xtuner
Support TIS and token mask at the same time with different thresholds. In this way, we can easily implement the IcePop trick from Ring

Usage

1. TIS

loss_cfg = GRPOLossConfig(
    policy_loss_cfg=dict(
        cliprange_high=0.28,
        cliprange_low=0.2,
        loss_type="vanilla",
        clip_ratio_c=10.0,
        log_prob_diff_min=-20.0,
        log_prob_diff_max=20.0,
    ),
    ignore_idx=-100,
    use_kl_loss=False,
    kl_loss_coef=0.0,
    kl_loss_type="low_var_kl",
    mode="chunk",
    chunk_size=512,
    rollout_is=RolloutImportanceSampling(
        rollout_is_level="token",
        rollout_is_mode="truncate",
        rollout_is_threshold=(2, 0.5),
    ),
)

2. IcePop

loss_cfg = GRPOLossConfig(
    policy_loss_cfg=dict(
        cliprange_high=0.28,
        cliprange_low=0.2,
        loss_type="vanilla",
        clip_ratio_c=10.0,
        log_prob_diff_min=-20.0,
        log_prob_diff_max=20.0,
    ),
    ignore_idx=-100,
    use_kl_loss=False,
    kl_loss_coef=0.0,
    kl_loss_type="low_var_kl",
    mode="chunk",
    chunk_size=512,
    rollout_is=RolloutImportanceSampling(
        rollout_is_level="token",
        rollout_is_mode="both",
        rollout_is_threshold=(5, 0.5),
        rollout_is_mask_threshold=(5, 0.5),
    ),
)

feat: support rollout importance sampling helper from verl

b1fb993

RangiLyu changed the title ~~feat: support rollout importance sampling modified verl~~ feat: support rollout importance sampling modified from verl Nov 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support rollout importance sampling modified from verl #1197

feat: support rollout importance sampling modified from verl #1197

Uh oh!

RangiLyu commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: support rollout importance sampling modified from verl #1197

Are you sure you want to change the base?

feat: support rollout importance sampling modified from verl #1197

Uh oh!

Conversation

RangiLyu commented Oct 30, 2025

Adapt VeRL's rollout importance sampling method

Usage

1. TIS

2. IcePop

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant