Understanding LinkNeighborLoader on collated data #6519
GianlucaDeStefano
started this conversation in
General
Replies: 1 comment 5 replies
-
Since you are using |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
I am trying to teach a model to perform link prediction, and my settings are the following:
To do this, I am doing the following:
My doubt is this:
Since I am using LinkNeighborLoader, I expect batches containing the same number of examples (edges to predict) to have a similar size. However, I have noticed that, on average, a larger collated graph yields larger (with more nodes) batches. Why does this happen? Do I misunderstand something here?
In my view, this should not happen since the LinkNeighborLoader class only loads the neighbors of the edge in question, and since the collate function should create a Data object containing multiple distinct graphs, the same edge sampled from the huge collated graph or the original should yield the same neighborhood.
I have created an example to explain better what I am trying to say here:
This code first creates two groups of 20 and 2 graphs, respectively, then it collates them together to form the following two graphs:
By using the LinkNeighborLoader from these two graphs, I then sample 2 batches like these:
Why is the graph contained in the training batch always sensibly larger than the one represented by the test batch? (Even if I use a batch size of 1)
Of course, the collated training graph is larger, but, as I said above, in my view, this should not matter.
Do I misunderstand something here?
Thanks in advance
Beta Was this translation helpful? Give feedback.
All reactions