reproducibilityindex.ai

Efficient Tuning and Inference for Large Language Models on Textual Graphs

Authors: Yun Zhu, Yaoke Wang, Haizhou Shi, Siliang Tang

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on textual graphs demonstrate our method s effectiveness by achieving the best model performance, meanwhile having the lowest training cost compared to previous methods. In this section, we first introduce the datasets used in Section 4.1. Then, we will illustrate the baselines and experimental setup in Section 4.2 and 4.3 respectively, and conduct experiments on these datasets to demonstrate the effectiveness of our proposed method in Section 4.4.
Researcher Affiliation	Collaboration	Yun Zhu1, , Yaoke Wang1, , Haizhou Shi2 and Siliang Tang1, 1Zhejiang University 2Rutgers University
Pseudocode	No	The paper describes its proposed method in detail with equations and textual descriptions, but it does not include a clearly labeled "Pseudocode" or "Algorithm" block, nor does it present the steps in a structured, code-like format.
Open Source Code	Yes	Our codes are available at: https://github.com/Zhu Yun97/ENGINE
Open Datasets	Yes	In this work, we adopt seven commonly used textual graphs to evaluate our proposed ENGINE: Cora [Sen et al., 2008], Cite Seer [Giles et al., 1998], Wiki CS [Mernyei and Cangea, 2020], OGBN-Ar Xiv [Hu et al., 2020], Ar Xiv-2023 [He et al., 2023b], OGBN-Products [Hu et al., 2020] and Ele Photo [Yan et al., 2023].
Dataset Splits	No	The paper states it uses training nodes and test nodes for evaluation ("specifically, given a set of training nodes Vtr, a classification model is trained on these nodes and evaluated on the remaining test nodes Vte."), but it does not explicitly provide details about a validation dataset split (percentages, sample counts, or specific methodology for creating one).
Hardware Specification	Yes	The batch size is set as 1 for tuning LMs, and the total training time is reported on the CPU of a 48-core Intel(R) Xeon(R) @ 2.50GHz and GPUs of 6 NVIDIA Ge Force RTX 3090.
Software Dependencies	No	The paper mentions using LLa MA2-7B and various baseline models (e.g., BERT, Sentence BERT, DeBERTa, GCN, SAGE, GAT), but it does not specify exact version numbers for software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used in the experiments.
Experiment Setup	Yes	The batch size is set as 1 for tuning LMs, and the total training time is reported on the CPU of a 48-core Intel(R) Xeon(R) @ 2.50GHz and GPUs of 6 NVIDIA Ge Force RTX 3090. We report the mean accuracy with a standard deviation across five different random seeds. For traditional GNN methods, we utilize grid search to obtain optimal results. The hyperparameters of the baselines can be found in Appendix C. In our experiments, we set the patience to 2 for most datasets, achieving comparable performance while significantly reducing inference time, denoted as ENGINE (Early). The number of G-Ladders is also analyzed in Section 4.6.