This repository contains the implementation of DMA in the paper "LLMs are Noisy Oracles! LLM-based Noise-aware Graph Active Learning for Node Classification".
-
To install the requirements:
pip install -r requirements.txt
-
Please follow the official instruction here to install PyTorch and here to install PyG.
-
Install ninja to enable c++ acceleration:
sudo wget https://github.com/ninja-build/ninja/releases/download/v1.8.2/ninja-linux.zip sudo unzip ninja-linux.zip -d /usr/local/bin/ sudo update-alternatives --install /usr/bin/ninja ninja /usr/local/bin/ninja 1 --force
-
Download Mixtral 8x7B from huggingface, then replace the PATH_TO_LLM variable in the scripts to the downloaded path.
To run DMA on Pubmed:
-
generate pseudo samples by querying LLM:
cd data python gen_pseudo_sample_pubmed.py
manually paste each category's pseudo sample generated by LLM into gen_json_pubmed.py
-
compute the class-wise similarity matrix:
python gen_json_pubmed.py python gen_llm_sim.py
-
use DMA to select nodes and train downstream GNNs:
python main.py --dataset pubmed --active dma