Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning

Authors: LINHAO LUO, Yuan-Fang Li, Gholamreza Haffari, Shirui Pan

ICLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on two benchmark KGQA datasets demonstrate that Ro G achieves state-of-the-art performance on KG reasoning tasks and generates faithful and interpretable reasoning results.
Researcher Affiliation Academia Linhao Luo, Yuan-Fang Li, Gholamreza Haffari Monash University Australia EMAIL Shirui Pan Griffith University Australia EMAIL
Pseudocode Yes Algorithm 1: Retrieve reasoning paths based on relation paths
Open Source Code Yes Code and data are available at: https://github.com/RMan Luo/reasoning-on-graphs
Open Datasets Yes We evaluate the reasoning ability of Ro G on two benchmark KGQA datasets: Web Question SP (Web QSP) (Yih et al., 2016) and Complex Web Questions (CWQ) (Talmor & Berant, 2018)
Dataset Splits No The paper states 'We follow previous works (Sun et al., 2018; Jiang et al., 2022) to use the same train and test splits for fair comparison.' and mentions 'instruction finetuned on the training split'. However, it does not provide specific details for a 'validation' split (percentages or counts) that would be needed for reproduction.
Hardware Specification Yes The training is conducted on 2 A100-80G GPUs for 38 hours.
Software Dependencies No The paper mentions using 'LLa MA2-Chat-7B (Touvron et al., 2023) as the LLM backbone', but does not provide specific version numbers for other ancillary software components like Python, PyTorch, or CUDA libraries.
Experiment Setup Yes The batch size is set to 4 and the learning rate is set to 2e-5. We use the cosine learning rate scheduler policy with the warmup ratio set to 0.03.