This repository implements a unified framework for knowledge graph completion (KGC) combining:
- GNN Distillation: Iterative message filtering to prevent over-smoothing
- Abstract Probabilistic Interaction Modeling (APIM): Structured probabilistic interaction patterns
Built upon Are_MPNNs_helpful (GNN backbone) and SimKGC (KGC framework), with key architectural enhancements.
- Python 3.9.19
- PyTorch 2.3.0 (CUDA 12.6 recommended)
- DGL 2.2.1
- Transformers 4.40.2
- NVIDIA Apex (FP16 training)
More detailed requirements can be found in requirements.txt
.
conda create -n kgc python=3.9.19
conda activate kgc
pip install -r requirements.txt
GNN-Distill + APIM training and evaluation can be run using the run.py
script.
# FB15k-237
nohup python run.py -model 'compgcn' -read_setting 'no_negative_sampling' -neg_num 0 -score_func 'conve' -data 'FB15k-237' -lr 0.001 -nheads 2 -batch 512 -embed_dim 200 -gcn_dim 100 -init_dim 100 -k_w 10 -k_h 20 -l2 0. -num_workers 3 -hid_drop 0.3 -pretrain_epochs 0 -candidate_num 0 -topk 20 -gcn_layer 4 -gnn_distillation -ratio_type 'exponential' -ratio 0.74 -output_dir '***'> FB15K237_CompGCN_Layer4_ExpDecay.log 2>&1 &
# WN18RR
nohup python run.py -model 'compgcn' -read_setting 'no_negative_sampling' -neg_num 0 -score_func 'conve' -data 'WN18RR' -lr 0.001 -nheads 2 -batch 512 -embed_dim 200 -gcn_dim 100 -init_dim 100 -k_w 10 -k_h 20 -l2 0. -num_workers 3 -hid_drop 0.3 -pretrain_epochs 0 -candidate_num 0 -topk 20 -gcn_layer 4 -gnn_distillation -ratio_type 'exponential' -ratio 0.74 -output_dir '***'> WN18RR_CompGCN_Layer4_ExpDecay.log 2>&1 &
model
: GNN model architecture (compgcn, rgcn, kbgat)read_setting
: negative sampling setting (in this paper we use no_negative_sampling)score_func
: scoring function for computing edge scores (in this paper we use conve)data
: knowledge graph dataset (FB15k-237, WN18RR)lr
: learning ratenheads
: number of attention headsbatch
: batch sizeembed_dim
: embedding dimensiongcn_dim
: GCN layer dimensioninit_dim
: initial feature dimensionk_w
: the size of conve kernel weightk_h
: the size of conve kernel heightl2
: L2 regularization strengthnum_workers
: number of workers for data loadinghid_drop
: dropout rate for hidden layerspretrain_epochs
: number of pretraining epochs (set to 0 for no pretraining)candidate_num
: must be 0 (the relational part of the model is not used in this paper)topk
: number of APIM candidates (in this paper we use 20)gcn_layer
: number of GCN layers (in this paper we use 4)gnn_distillation
: whether to use GNN Distillationratio_type
: ratio_type of the GNN Distillation model (linear, exponential)ratio
: ratio of the GNN Distillation model (in this paper we use exponential decay ratio=0.74, linear decay ratio=0.4)output_dir
: output directory for saving model checkpoints and logs