How can I establish the relation between each graph with its respective News's id? #24
Closed
Humberto-Turioni
started this conversation in
General
Replies: 2 comments 1 reply
-
Hi! If you are still looking for an answer to this, I think I may be able to help (since I needed this for my own use case as well).
Here is a sample code of how to get the IDs. The comments should explain the steps: import numpy as np
import pickle
from torch_geometric.datasets import UPFD
with open("<your-path>/pol_id_twitter_mapping.pkl", "rb") as f:
id_mapping = pickle.load(f)
# the id_indices (keys) should be in ascending order:
assert np.all(np.array([k for k in id_mapping]) == np.arange(len(id_mapping)))
# therefore the id_values are also iterated in the correct order
# now we just need to find root ids, we can abuse the fact that
# root node ids are always strings, while the twitter ids are integers
graph_ids: list[list[str]] = []
for v in id_mapping.values():
try:
_ = int(v)
graph_ids[-1].append(v)
except ValueError:
graph_ids.append([v])
root_ids: list[str] = [i[0] for i in graph_ids]
# alternatively (maybe less "hacky", but involves more steps and another file) we could use
# <your-path>/politifact/raw/node_graph_idx.npy
# then find the first node (root) index of each graph
# then use the root indices as keys in id_mapping to get the root ids
# now we have the root ids, and can index the training graph positions
train_indices = np.load("<your-path>/politifact/raw/train_idx.npy")
train_root_ids = [root_ids[i] for i in train_indices]
train_all_ids = {root_ids[i]: graph_ids[i] for i in train_indices}
# done! you should now have what you were looking for
# as extra insurance that this is the correct mapping, you can check that the graph sizes match the
# number of ids that we collected for each graph above:
train_dataset = UPFD("<your-path>", "politifact", "content", "train")
assert len(train_dataset) == len(train_root_ids)
for graph_id, graph in zip(train_root_ids, train_dataset):
assert len(train_all_ids[graph_id]) == graph.x.shape[0]
print(graph_id) |
Beta Was this translation helpful? Give feedback.
1 reply
-
Thank you Yingtong and Philipp! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi!
I have faced difficulties to establish the releation graphs and News's id.
For example,
train_data = UPFD(root="C:\Users\user\Desktop\execucao", name="gossipcop", feature="content", split="train")
In the train_data[x] I couldn't estabilish the relation with gos_news_list.txt.
I want to know which news each graph belongs
Beta Was this translation helpful? Give feedback.
All reactions