-
Hello PyG Community, I have built my own This is how I use it currently for a single graph # For this, we first split the set of edges into
# training (80%), validation (10%), and testing edges (10%).
# Across the training edges, we use 70% of edges for message passing, and 30% of edges for supervision.
# We further want to generate fixed negative edges for evaluation with a ratio of 2:1.
# Negative edges during training will be generated on-the-fly.
data= HeteroData()
# Copy everything from Dataset[0] into data
transform = T.RandomLinkSplit(
num_val=0.1,
num_test=0.1,
disjoint_train_ratio=0.3,
neg_sampling_ratio=2.0,
add_negative_train_samples=False,
edge_types=("node1", "to", "node2"),
# ("node2", "to", "node3"),
# ("node3", "to", "node1")],
rev_edge_types=("node2", "rev_to", "node1"),
# ("node3", "rev_to", "node2"),
# ("node1", "rev_to", "node3")]
)
train_data, val_data, test_data = transform(data)
from torch_geometric.loader import LinkNeighborLoader
# Define seed edges:
edge_label_index = train_data["node1", "to", "node2"].edge_label_index
edge_label = train_data["node1", "to", "node2"].edge_label.type(torch.LongTensor)
train_loader = LinkNeighborLoader(
data=train_data,
num_neighbors=[50, 10],
neg_sampling_ratio=2.0,
edge_label_index=(("node1", "to", "node2"), edge_label_index),
edge_label=edge_label,
batch_size=128,
shuffle=True,
) but that only builds a train test split for a single graph. I ideally want the train and test data to have positive and negative edges from all 365 days and make it larger. Along with that, the random neighbor sampling for the Am i using these methods wrong? I was wondering if there was a clever and easy way to do this with the helper functions and methods available already, that maybe i'm not aware of or thought of trying. In conclusion
Any advice or guidance would be helpful ! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
If you are working with multiple graphs, the best choice for dataloading should be the default train_dataset, val_dataset, test_dataset = zip(*dataset) I added a short test for this, see #7211 |
Beta Was this translation helpful? Give feedback.
If you are working with multiple graphs, the best choice for dataloading should be the default
DataLoader
of PyG. In your case, this is a bit problematic because the transform returns a tuple of data objects. As such, you need to convert that into three dataset, which you can do viaI added a short test for this, see #7211