forked from hiyouga/LLaMA-Factory
-
Notifications
You must be signed in to change notification settings - Fork 41
Open
Description
Reminder
- I have read the README and searched the existing issues.
System Info
Traceback (most recent call last):
File "/home/ai/anaconda3/bin/llamafactory-cli", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/ai/360-LLaMA-Factory/src/llamafactory/cli.py", line 112, in main
run_exp()
File "/home/ai/360-LLaMA-Factory/src/llamafactory/train/tuner.py", line 56, in run_exp
run_dpo(model_args, data_args, training_args, finetuning_args, callbacks)
File "/home/ai/360-LLaMA-Factory/src/llamafactory/train/dpo/workflow.py", line 47, in run_dpo
model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train, full_determinism=training_args.full_determinism)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ai/360-LLaMA-Factory/src/llamafactory/model/loader.py", line 147, in load_model
sequence_parallel_group = apply_sequence_parallel(model_args, full_determinism) # monkey patching, similar to liger_kernel
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ai/360-LLaMA-Factory/src/llamafactory/model/model_utils/sequence_parallel.py", line 62, in apply_sequence_parallel
group_this = init_sp_group(model_args.sequence_parallel_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ai/360-LLaMA-Factory/src/llamafactory/model/model_utils/sequence_parallel.py", line 43, in init_sp_group
assert dist.is_initialized()
^^^^^^^^^^^^^^^^^^^^^
AssertionError
Reproduction
yaml:
stage: dpo
do_train: true
finetuning_type: lora
sequence_parallel_size: 4
Metadata
Metadata
Assignees
Labels
No labels