-
Notifications
You must be signed in to change notification settings - Fork 382
Open
Description
log:
2023-04-26 17:41:51.342225: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:417 : INVALID_ARGUMENT: Trying to access resource Resource-0-at-0x267a43f0 located
in device /job:localhost/replica:0/task:0/device:GPU:0 from device /job:localhost/replica:0/task:0/device:GPU:1
Cf. https://www.tensorflow.org/xla/known_issues#tfvariable_on_a_different_device
2023-04-26 17:41:51.342242: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:417 : INVALID_ARGUMENT: Trying to access resource Resource-0-at-0x267a43f0 located
in device /job:localhost/replica:0/task:0/device:GPU:0 from device /job:localhost/replica:0/task:0/device:GPU:1
Cf. https://www.tensorflow.org/xla/known_issues#tfvariable_on_a_different_device
2023-04-26 17:41:51.342632: I tensorflow/compiler/xla/service/service.cc:173] XLA service 0x7feef4013580 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices
:
2023-04-26 17:41:51.386968: I tensorflow/compiler/xla/service/service.cc:181] StreamExecutor device (0): Tesla V100-PCIE-16GB, Compute Capability 7.0
2023-04-26 17:41:51.386982: I tensorflow/compiler/xla/service/service.cc:181] StreamExecutor device (1): Tesla V100-PCIE-16GB, Compute Capability 7.0
2023-04-26 17:41:51.386987: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:417 : INVALID_ARGUMENT: Trying to access resource Resource-0-at-0x267a43f0 located
in device /job:localhost/replica:0/task:0/device:GPU:0 from device /job:localhost/replica:0/task:0/device:GPU:1
Cf. https://www.tensorflow.org/xla/known_issues#tfvariable_on_a_different_device
2023-04-26 17:41:51.387011: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:417 : INVALID_ARGUMENT: Trying to access resource Resource-0-at-0x267a43f0 located
in device /job:localhost/replica:0/task:0/device:GPU:0 from device /job:localhost/replica:0/task:0/device:GPU:1
Cf. https://www.tensorflow.org/xla/known_issues#tfvariable_on_a_different_device
command:
CUDA_VISIBLE_DEVICES=1,2 onmt-main --model ../config/models/tiny_multi_source_transformer.py --config data_tiny_0425.yml --auto_config train --with_eval --num_gpus 2
model:
class TinyDualSourceTransformer(onmt.models.Transformer):
def __init__(self):
super(TinyDualSourceTransformer, self).__init__(
source_inputter=onmt.inputters.ParallelInputter([
onmt.inputters.WordEmbedder(embedding_size=256),
onmt.inputters.WordEmbedder(embedding_size=256)]),
target_inputter=onmt.inputters.WordEmbedder(embedding_size=256),
num_layers=4,
num_units=128,
num_heads=4,
ffn_inner_dim=512,
dropout=0.1,
attention_dropout=0.1,
ffn_dropout=0.1,
share_encoders=True)
def auto_config(self, num_replicas=1):
config = super(TinyDualSourceTransformer, self).auto_config(num_replicas=num_replicas)
max_length = config["train"]["maximum_features_length"]
return misc.merge_dict(config, {
"train": {
"maximum_features_length": [max_length, max_length]
}
})
yaml:
model_dir: run_/
data:
train_features_file:
- input.subword.train
- label.subword.train
train_labels_file: output.subword.train
eval_features_file:
- input.subword.val
- label.subword.val
eval_labels_file: output.subword.val
source_1_vocabulary: input.vocab.txt
source_2_vocabulary: label.vocab.txt
target_vocabulary: output.vocab.txt
train:
batch_size: 256
batch_type: examples
save_checkpoints_steps: 1000
max_step: 30000
maximum_features_length: [30, 30]
maximum_labels_length: 30
sample_buffer_size: 0
eval:
steps: 1000
Thanks!
Metadata
Metadata
Assignees
Labels
No labels