Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
LLaGA: Large Language and Graph Assistant
Authors: Runjin Chen, Tong Zhao, Ajay Kumar Jaiswal, Neil Shah, Zhangyang Wang
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments across popular graph benchmarks show that LLa GA delivers outstanding performance across four datasets and three tasks using one single model, surpassing state-of-the-art graph models in both supervised and zeroshot scenarios. |
| Researcher Affiliation | Collaboration | 1The University of Texas at Austin 2Snap Inc. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. Figure 1 illustrates the LLa GA framework but is not a pseudocode representation. |
| Open Source Code | Yes | Our code is available at https: //github.com/VITA-Group/LLa GA |
| Open Datasets | Yes | Datasets. We train and evaluate our model on four widely-recognized graph datasets: ogbn-Arxiv (Hu et al., 2020), ogbn-Products (Hu et al., 2020), Pubmed, and Cora (Yang et al., 2016). |
| Dataset Splits | Yes | For node-level tasks, we adhere to the standard train/validation/test splits (Hu et al., 2020) for each dataset: 6:2:3 for Arxiv, 8:2:90 for Products, and 6:2:2 for both Pubmed and Cora. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run its experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | Yes | In our model s implementation, we primarily employ Vicuna-7B-v1.5-16K (Chiang et al., 2023) as the foundational base models, and Sim Teg (Duan et al., 2023) as default text-encoding model. |
| Experiment Setup | Yes | The learning rate is consistently set to 2e-5, and the batch size is maintained at 16 for all models. We train our model for one epoch. For the Neighborhood Detail Template, we sample two-hop neighbors around each node, setting the sample size to 10 for each hop. In the Hop-Field Overview Template, 4 hop embeddings are employed to encapsulate the structural information surrounding the central node. |