Skip to content

Conversation

@RangiLyu
Copy link
Contributor

Adapt VeRL's rollout importance sampling method

Modify from https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/mismatch_helper.py

Main differences:

  • Adapt to packed sequences in Xtuner
  • Support TIS and token mask at the same time with different thresholds. In this way, we can easily implement the IcePop trick from Ring

Usage

1. TIS

loss_cfg = GRPOLossConfig(
    policy_loss_cfg=dict(
        cliprange_high=0.28,
        cliprange_low=0.2,
        loss_type="vanilla",
        clip_ratio_c=10.0,
        log_prob_diff_min=-20.0,
        log_prob_diff_max=20.0,
    ),
    ignore_idx=-100,
    use_kl_loss=False,
    kl_loss_coef=0.0,
    kl_loss_type="low_var_kl",
    mode="chunk",
    chunk_size=512,
    rollout_is=RolloutImportanceSampling(
        rollout_is_level="token",
        rollout_is_mode="truncate",
        rollout_is_threshold=(2, 0.5),
    ),
)

2. IcePop

loss_cfg = GRPOLossConfig(
    policy_loss_cfg=dict(
        cliprange_high=0.28,
        cliprange_low=0.2,
        loss_type="vanilla",
        clip_ratio_c=10.0,
        log_prob_diff_min=-20.0,
        log_prob_diff_max=20.0,
    ),
    ignore_idx=-100,
    use_kl_loss=False,
    kl_loss_coef=0.0,
    kl_loss_type="low_var_kl",
    mode="chunk",
    chunk_size=512,
    rollout_is=RolloutImportanceSampling(
        rollout_is_level="token",
        rollout_is_mode="both",
        rollout_is_threshold=(5, 0.5),
        rollout_is_mask_threshold=(5, 0.5),
    ),
)

@RangiLyu RangiLyu changed the title feat: support rollout importance sampling modified verl feat: support rollout importance sampling modified from verl Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant