-
Notifications
You must be signed in to change notification settings - Fork 245
XLA bug #292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
您好,您的邮件我已收到。我会尽快给您回复。祝好!
|
my cuda and tensorflow version : (/data/npl/Speech2Text/TensorFlowASR-main/venv) npl@uit-dgx01:/data/npl$ nvcc --version |
@itsmekhoathekid there's a newer version with tf v2.18 and keras v3 on branch |
i got these errros while running with config :
python /data/npl/Speech2Text/TensorFlowASR-main/examples/train.py --mxp=auto --jit-compile --config-path=/data/npl/Speech2Text/TensorFlowASR-main/examples/models/transducer/rnnt/small.yml.j2 --dataset-type=tfrecord --modeldir=/data/npl/Speech2Text/TensorFlowASR-main/tensorflow_asr/checkpoint --datadir=/data/npl/Speech2Text/TensorFlowASR-main/scripts/data
Epoch 1/300
INFO:tensorflow:Collective all_reduce tensors: 39 all_reduces, num_devices = 8, group_size = 8, implementation = CommunicationImplementation.AUTO, num_packs = 1
INFO:tensorflow:Collective all_reduce tensors: 1 all_reduces, num_devices = 8, group_size = 8, implementation = CommunicationImplementation.AUTO, num_packs = 1
INFO:tensorflow:Collective all_reduce tensors: 1 all_reduces, num_devices = 8, group_size = 8, implementation = CommunicationImplementation.AUTO, num_packs = 1
INFO:tensorflow:Error reported to Coordinator: We failed to lift variable creations out of this tf.function, so this tf.function cannot be run on XLA. A possible workaround is to move variable creation outside of the XLA compiled function.
Traceback (most recent call last):
File "/data/npl/Speech2Text/TensorFlowASR-main/venv/lib/python3.9/site-packages/tensorflow/python/training/coordinator.py", line 293, in stop_on_exception
yield
File "/data/npl/Speech2Text/TensorFlowASR-main/venv/lib/python3.9/site-packages/tensorflow/python/distribute/mirrored_run.py", line 387, in run
self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
File "/data/npl/Speech2Text/TensorFlowASR-main/venv/lib/python3.9/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/data/npl/Speech2Text/TensorFlowASR-main/venv/lib/python3.9/site-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py", line 946, in _call
raise errors.UnimplementedError(
tensorflow.python.framework.errors_impl.UnimplementedError: We failed to lift variable creations out of this tf.function, so this tf.function cannot be run on XLA. A possible workaround is to move variable creation outside of the XLA compiled function.
Traceback (most recent call last):
File "/data/npl/Speech2Text/TensorFlowASR-main/examples/train.py", line 110, in
cli_util.run(main)
File "/data/npl/Speech2Text/TensorFlowASR-main/venv/lib/python3.9/site-packages/tensorflow_asr/utils/cli_util.py", line 19, in run
fire.Fire(component, command=command, name=name)
File "/data/npl/Speech2Text/TensorFlowASR-main/venv/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/data/npl/Speech2Text/TensorFlowASR-main/venv/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/data/npl/Speech2Text/TensorFlowASR-main/venv/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/data/npl/Speech2Text/TensorFlowASR-main/examples/train.py", line 98, in main
model.fit(
File "/data/npl/Speech2Text/TensorFlowASR-main/venv/lib/python3.9/site-packages/tensorflow_asr/models/base_model.py", line 544, in fit
tmp_logs, caching = self.train_function(iterator, caching=caching)
File "/data/npl/Speech2Text/TensorFlowASR-main/venv/lib/python3.9/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/data/npl/Speech2Text/TensorFlowASR-main/venv/lib/python3.9/site-packages/tensorflow/python/eager/polymorphic_function/autograph_util.py", line 52, in autograph_handler
raise e.ag_error_metadata.to_exception(e)
tensorflow.python.framework.errors_impl.UnimplementedError: in user code:
i tried to use one gpu to train (A100) but its extremely slow. Can you please help .
The text was updated successfully, but these errors were encountered: