Skip to content

add sequence_parallel_size and error. #63

@guotong1988

Description

@guotong1988

Reminder

  • I have read the README and searched the existing issues.

System Info

Traceback (most recent call last):
  File "/home/ai/anaconda3/bin/llamafactory-cli", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/ai/360-LLaMA-Factory/src/llamafactory/cli.py", line 112, in main
    run_exp()
  File "/home/ai/360-LLaMA-Factory/src/llamafactory/train/tuner.py", line 56, in run_exp
    run_dpo(model_args, data_args, training_args, finetuning_args, callbacks)
  File "/home/ai/360-LLaMA-Factory/src/llamafactory/train/dpo/workflow.py", line 47, in run_dpo
    model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train, full_determinism=training_args.full_determinism)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ai/360-LLaMA-Factory/src/llamafactory/model/loader.py", line 147, in load_model
    sequence_parallel_group = apply_sequence_parallel(model_args, full_determinism)  # monkey patching, similar to liger_kernel
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ai/360-LLaMA-Factory/src/llamafactory/model/model_utils/sequence_parallel.py", line 62, in apply_sequence_parallel
    group_this = init_sp_group(model_args.sequence_parallel_size)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ai/360-LLaMA-Factory/src/llamafactory/model/model_utils/sequence_parallel.py", line 43, in init_sp_group
    assert dist.is_initialized()
           ^^^^^^^^^^^^^^^^^^^^^
AssertionError

Reproduction

yaml:

stage: dpo
do_train: true
finetuning_type: lora
sequence_parallel_size: 4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions