Skip to content

Trying to distil whisper for arabic Language but faced an error in step 3 #147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Manel-Hik opened this issue Jul 31, 2024 · 0 comments
Open

Comments

@Manel-Hik
Copy link

Manel-Hik commented Jul 31, 2024

Hi
I'm working on applying the technique explained in this repo in order to distill whisper for the Arabic language.
using the common voice dataset arabic split
I did step1: Creating the pseudo-labelled dataset and step2: Initilization of the student model
but in step 3: while training I faced this error:

[rank1]: Traceback (most recent call last):
distil-whisper/training/distil-whisper-small-v1-ar/run_distillation.py", line 1811, in
[rank1]:     main()
distil-whisper/training/distil-whisper-small-v1-ar/run_distillation.py", line 1644, in main
[rank1]:     student_model.generation_config.save_pretrained(intermediate_dir)
envs/distilW_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1729, in getattr
[rank1]:     raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
[rank1]: AttributeError: 'DistributedDataParallel' object has no attribute 'generation_config'
07/31/2024 11:28:33 - INFO - accelerate.checkpointing - Model weights saved in checkpoint-2000-epoch-117/model.safetensors
07/31/2024 11:28:33 - WARNING - accelerate.utils.other - Removed shared tensor {'proj_out.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
torch/distributed/elastic/multiprocessing/api.py:858] Sending process 1706120 closing signal SIGTERM
torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 1 (pid: 1706121) of binary: envs/distilW_env/bin/python
Traceback (most recent call last):
envs/distilW_env/bin/accelerate", line 8, in
   sys.exit(main())
distilW_env/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
   args.func(args)
distilW_env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1097, in launch_command
   multi_gpu_launcher(args)
distilW_env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 734, in multi_gpu_launcher
   distrib_run.run(args)
envs/distilW_env/lib/python3.10/site-packages/torch/distributed/run.py", line 892, in run
   elastic_launch(
envs/distilW_env/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 133, in call
   return launch_agent(self._config, self._entrypoint, list(args))
/distilW_env/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
   raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 

run_distillation.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time      : 2024-07-31_11:28:34

rank      : 1 (local_rank: 1)
exitcode  : 1 (pid: 1706121)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Could you help me to figure out this error?
Thanks in advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant