Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
LEGO: Latent Execution-Guided Reasoning for Multi-Hop Question Answering on Knowledge Graphs
Authors: Hongyu Ren, Hanjun Dai, Bo Dai, Xinyun Chen, Michihiro Yasunaga, Haitian Sun, Dale Schuurmans, Jure Leskovec, Denny Zhou
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on several KGQA benchmarks demonstrate the effectiveness of our framework compared with previous state of the art. |
| Researcher Affiliation | Collaboration | 1Stanford University 2Google Brain 3UC Berkeley 4Carnegie Mellon University. |
| Pseudocode | Yes | Algorithm 1 Latent Execution-Guided Reasoning (Training) Algorithm 2 Latent Execution-Guided Reasoning (Inference) |
| Open Source Code | Yes | The implementation of LEGO can be found in http://github.com/snap-stanford/lego. |
| Open Datasets | Yes | We evaluate LEGO on three large-scale multi-hop KGQA benchmark datasets: Meta QA (Zhang et al., 2018), Web Questions SP (WQSP) (Yih et al., 2015) and Complex Web Questions (CWQ) (Talmor & Berant, 2018). |
| Dataset Splits | Yes | Table 2. Statistics of the three datasets. Train Dev Test Meta QA-1hop 96,106 9,992 9,947 Meta QA-2hop 118,980 14,872 14,872 Meta QA-3hop 114,196 14,274 14,274 WQSP 2,848 250 1,639 CWQ 27,623 3,518 3,531 |
| Hardware Specification | No | The paper does not specify the hardware used for experiments. |
| Software Dependencies | No | The paper mentions using a pretrained language model (Devlin et al., 2019; Reimers & Gurevych, 2019) and Query2box (Q2B) (Ren et al., 2020) but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | No | The paper does not include specific details on hyperparameters or training settings in the main text. |