Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Question Decomposition Tree for Answering Complex Questions over Knowledge Bases

Authors: Xiang Huang, Sitao Cheng, Yiheng Shu, Yuheng Bao, Yuzhong Qu

AAAI 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that QDTQA outperforms previous state-of-the-art methods on Complex Web Questions dataset. Besides, our decomposition method improves an existing KBQA system by 11% and sets a new state-of-the-art on LC-Qu AD 1.0.
Researcher Affiliation	Academia	State Key Laboratory for Novel Software Technology, Nanjing University, China EMAIL, EMAIL
Pseudocode	No	The paper describes the methods textually and with a flowchart (Figure 2) but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	1Our code and dataset are available at https://github.com/cdhx/ QDTQA
Open Datasets	Yes	The questions in QDTrees are derived from two complex KBQA datasets: Complex Web Questions (CWQ) (Talmor and Berant 2018) and LC-Qu AD 1.0 (LC) (Trivedi et al. 2017).
Dataset Splits	Yes	For CWQ, we annotate three subsets with 2,000/500/500 questions randomly sampled from the training/validation/test sets, respectively. ... Since LC does not provide an official validation set, we split the training set into a new training set (the first 3,200 questions) and a validation set (the last 800 questions).
Hardware Specification	Yes	We train our models for 100 epochs on an NVIDIA Ge Force RTX 3090 GPU and save the best checkpoints on the validation set.
Software Dependencies	No	The paper mentions 'Pytorch' and 'Hugging Face' as frameworks, and 'T5-base' and 'BERT-base' as models, but does not provide specific version numbers for any of these software components.
Experiment Setup	Yes	The batch sizes for Clue Net and Decipher Net are set to 64. ... The batch size is set to 16 and the max length is set to 196. The entity disambiguation model is based on BERT-base, in which we set batch size to 16 and max length to 96.