reproducibilityindex.ai

Label-free Node Classification on Graphs with Large Language Models (LLMs)

Authors: Zhikai Chen, Haitao Mao, Hongzhi Wen, Haoyu Han, Wei Jin, Haiyang Zhang, Hui Liu, Jiliang Tang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experimental results validate the effectiveness of LLM-GNN on text-attributed graphs from various domains. In this section, we present experiments to evaluate the performance of our proposed pipeline LLM-GNN.
Researcher Affiliation	Collaboration	Zhikai Chen1, Haitao Mao1, Hongzhi Wen1, Haoyu Han1, Wei Jin2, Haiyang Zhang3, Hui Liu1, Jiliang Tang1 1Michigan State University 2Emory University 3Amazon.com
Pseudocode	No	The paper describes its pipeline and components in text and with a diagram (Figure 1), but no formal pseudocode or algorithm block is provided.
Open Source Code	Yes	Our code is available from https://github.com/Curry Tang/LLMGNN.
Open Datasets	Yes	In this paper, we adopt the following TAG datasets widely adopted for node classification: CORA (Mc Callum et al., 2000), CITESEER (Giles et al., 1998), PUBMED (Sen et al., 2008), OGBN-ARXIV, OGBN-PRODUCTS (Hu et al., 2020b), and WIKICS (Mernyei & Cangea, 2020).
Dataset Splits	Yes	Similar to (Ma et al., 2022), we adopt a setting where there s no validation set, and models trained on selected nodes will be further tested based on the rest unlabeled nodes. ... For the budget of the active selection, we refer to the popular semi-supervised learning setting for node classifications (Yang et al., 2016) and set the budget equal to 20 multiplied by the number of classes.
Hardware Specification	No	The paper does not specify any particular hardware used for running the experiments (e.g., GPU models, CPU types, memory, or cloud instance specifications).
Software Dependencies	No	The paper mentions using "gpt-3.5-turbo-0613" and "Sentence BERT" but does not provide specific version numbers for these tools or any other software libraries or programming languages used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	For small-scale datasets including CORA, CITESEER, PUBMED, and WIKICS, we set: learning rate to 0.01, weight decay to 5e 4, hidden dimension to 64, dropout to 0.5. 2. For large-scale datasets including OGBN-ARXIV and OGBN-PRODUCTS, we set: learning rate to 0.01, weight decay to 5e 4, hidden dimension to 256, dropout to 0.5. ... by setting a small fixed number of training epochs, such as 30 epochs for small and medium-scale datasets (CORA, CITESEER, PUBMED, and WIKICS), and 50 epochs for the rest large-scale datasets.