Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning
Authors: LINHAO LUO, Yuan-Fang Li, Gholamreza Haffari, Shirui Pan
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on two benchmark KGQA datasets demonstrate that Ro G achieves state-of-the-art performance on KG reasoning tasks and generates faithful and interpretable reasoning results. |
| Researcher Affiliation | Academia | Linhao Luo, Yuan-Fang Li, Gholamreza Haffari Monash University Australia EMAIL Shirui Pan Griffith University Australia EMAIL |
| Pseudocode | Yes | Algorithm 1: Retrieve reasoning paths based on relation paths |
| Open Source Code | Yes | Code and data are available at: https://github.com/RMan Luo/reasoning-on-graphs |
| Open Datasets | Yes | We evaluate the reasoning ability of Ro G on two benchmark KGQA datasets: Web Question SP (Web QSP) (Yih et al., 2016) and Complex Web Questions (CWQ) (Talmor & Berant, 2018) |
| Dataset Splits | No | The paper states 'We follow previous works (Sun et al., 2018; Jiang et al., 2022) to use the same train and test splits for fair comparison.' and mentions 'instruction finetuned on the training split'. However, it does not provide specific details for a 'validation' split (percentages or counts) that would be needed for reproduction. |
| Hardware Specification | Yes | The training is conducted on 2 A100-80G GPUs for 38 hours. |
| Software Dependencies | No | The paper mentions using 'LLa MA2-Chat-7B (Touvron et al., 2023) as the LLM backbone', but does not provide specific version numbers for other ancillary software components like Python, PyTorch, or CUDA libraries. |
| Experiment Setup | Yes | The batch size is set to 4 and the learning rate is set to 2e-5. We use the cosine learning rate scheduler policy with the warmup ratio set to 0.03. |