Large custom dataset for link prediction #6449
GianlucaDeStefano
started this conversation in
General
Replies: 1 comment 5 replies
-
Thanks for the discussion. |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all,
I have a set of custom datasets that I use for my Link Prediction experiments.
Until now they all have been based on the
InMemoryDataset
class, but now that I am trying to scale I have to switch to the lowerDataset
interface as they don't fit in memory.The issue is that I am not sure how to implement this without breaking the remaining part of my codebase.
At the moment, a single huge HeteroData object represents the entire dataset. This is very convenient, for example, to obtain the
edge_label
andedge_label_index
of the edges to predict:Switching to a
not in memory dataset
would break this as since I wouldn't have a single Data object anymore these tensors would be incomplete and since I use them as parameters for theLinkLoader
this is a problem.Is there maybe an example showing how to use the
Dataset
interface for link prediction purposes?For my use case what should the
def get(self, idx)
return?In this example
it returns an entire graph. Is this always the case to the returned object should change depending on the task?
Beta Was this translation helpful? Give feedback.
All reactions