Encountered a CUDA error and edge index error #6672
amalislam675
started this conversation in
General
Replies: 1 comment 6 replies
-
Looks like you are experiencing a CUDA OOM error. How big is your data? :) |
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am running the PYG to train my model. I am facing this error. Is there any fix?
RuntimeError Traceback (most recent call last)
//user1/.conda/envs/PY37_1/lib/python3.7/site-packages/torch_geometric/nn/conv/message_passing.py in lift(self, src, edge_index, dim)
238 index = edge_index[dim]
--> 239 return src.index_select(self.node_dim, index)
240 except (IndexError, RuntimeError) as e:
RuntimeError: CUDA out of memory. Tried to allocate 25.04 GiB (GPU 0; 23.65 GiB total capacity; 1.73 GiB already allocated; 21.06 GiB free; 1.84 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
/tmp/ipykernel_90168/3092512400.py in
----> 1 model.forward(train_data["feature1"], train_data["edge_index1"], train_data["edge_weight1"])
/tmp/ipykernel_90168/1301479275.py in forward(self, feature1, edge_index1, edge_weight1)
23 x = F.elu(x)
24 x = F.dropout(x, p=0.6, training=self.training)
---> 25 x = self.conv2(x, edge_index1, edge_weight1)
26
27 return F.log_softmax(x, dim=1)
//user1/.conda/envs/PY37_1/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1129 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130 return forward_call(*input, **kwargs)
1131 # Do not call functions when jit is used
1132 full_backward_hooks, non_full_backward_hooks = [], []
/tmp/ipykernel_90168/3881736389.py in forward(self, x, edge_index, edge_attr, return_attention_weights)
130 # propagate_type: (x: PairTensor, edge_attr: OptTensor)
131 out = self.propagate(edge_index, x=(x_l, x_r), edge_attr=edge_attr,
--> 132 size=None)
133
134 alpha = self._alpha
//user1/.conda/envs/PY37_1/lib/python3.7/site-packages/torch_geometric/nn/conv/message_passing.py in propagate(self, edge_index, size, **kwargs)
428
429 coll_dict = self.collect(self.user_args, edge_index,
--> 430 size, kwargs)
431
432 msg_kwargs = self.inspector.distribute('message', coll_dict)
//user1/.conda/envs/PY37_1/lib/python3.7/site-packages/torch_geometric/nn/conv/message_passing.py in collect(self, args, edge_index, size, kwargs)
299 if isinstance(data, Tensor):
300 self.set_size(size, dim, data)
--> 301 data = self.lift(data, edge_index, dim)
302
303 out[arg] = data
//user1/.conda/envs/PY37_1/lib/python3.7/site-packages/torch_geometric/nn/conv/message_passing.py in lift(self, src, edge_index, dim)
241 if 'CUDA' in str(e):
242 raise ValueError(
--> 243 f"Encountered a CUDA error. Please ensure that all "
244 f"indices in 'edge_index' point to valid indices "
245 f"in the interval [0, {src.size(self.node_dim)}) in "
ValueError: Encountered a CUDA error. Please ensure that all indices in 'edge_index' point to valid indices in the interval [0, 128) in your node feature matrix and try again.
Beta Was this translation helpful? Give feedback.
All reactions