Graph classification, small sample size #7032

raythroughspace · 2023-03-25T00:33:40Z

raythroughspace
Mar 25, 2023

What are some standard ways to deal with very small datasets (<20 graphs) for graph binary classification tasks? My graphs have tens of thousands of nodes with millions of edges. When I train a GNN on these large unfiltered graphs (<20 instances), I notice the GNN achieves 100% accuracy.

Nodes in my graph have gene expression values, when I filter out some nodes using standard genetic techniques, I notice the GNN can no longer correctly classify and will predict either all positive or all negative (even though all the nodes filtered out are low expressed genes which in theory should not affect the predictions).

My goal is to filter down the size of the graphs from ~20000 nodes to ~1000s of nodes and still have perfect accuracy (I know it is possible to achieve this using other ML techniques). But I've tried the standard GNN models (GCN, GAT, SAGE) with many different hyperparameters, and all of them will predict either all + or -. I feel like the small sample size may be the cause of my issues. Is there anything I can do in this case?

rusty1s · 2023-03-25T14:08:59Z

rusty1s
Mar 25, 2023
Maintainer

If I understand you correctly, you want to create subgraphs out of your graphs, which you then use for graph classification. Is this correct? One thing you could do is partition your graph via METIS (see torch_geometric.data.ClusterData), which might be worth to try out.

2 replies

raythroughspace Mar 26, 2023
Author

That was what I was thinking but is this common practice? Let's say there are n graphs to classify and I split each in two, so each sample gets split into two graphs. Let's say the model classifies one graph correctly and the other incorrectly, would I be able to say the sample is correctly classified because one of the split graphs is correctly classified?

rusty1s Mar 28, 2023
Maintainer

I would say it's not that common because graph classification on a tiny set of large graphs is very rare :)

In your case, that depends on how you choose to ensemble your model on different subgraphs together. For example, you can run your model on 100 different subgraphs, and then pick the majority of the class the model predicted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Graph classification, small sample size #7032

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Graph classification, small sample size #7032

Uh oh!

raythroughspace Mar 25, 2023

Replies: 1 comment · 2 replies

Uh oh!

rusty1s Mar 25, 2023 Maintainer

Uh oh!

raythroughspace Mar 26, 2023 Author

Uh oh!

rusty1s Mar 28, 2023 Maintainer

raythroughspace
Mar 25, 2023

Replies: 1 comment 2 replies

rusty1s
Mar 25, 2023
Maintainer

raythroughspace Mar 26, 2023
Author

rusty1s Mar 28, 2023
Maintainer