reproducibilityindex.ai

SQALER: Scaling Question Answering by Decoupling Multi-Hop and Logical Reasoning

Authors: Mattia Atzeni, Jasmina Bogojeska, Andreas Loukas

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on knowledge-based question answering show that our approach solves the multi-hop Meta QA dataset, achieves a new state-of-the-art on the more challenging Web Questions SP, is orders of magnitude more scalable than competitive approaches, and can achieve compositional generalization out of the training distribution.
Researcher Affiliation	Collaboration	Mattia Atzeni IBM Research, EPFL Switzerland atz@zurich.ibm.com Jasmina Bogojeska IBM Research Switzerland jbo@zurich.ibm.com Andreas Loukas EPFL Switzerland andreas.loukas@epfl.ch
Pseudocode	No	The paper describes the model architecture and procedures in detail using text and figures, but it does not include formal pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement about releasing its source code or a link to a code repository for the methodology described.
Open Datasets	Yes	We evaluate the reasoning performance of our approach on Meta QA [50] and Web Questions SP [49]. Meta QA includes multi-hop questions over the Wiki Movies KB [35]... We further assess the compositional generalization ability of SQALER on the Compositional Freebase Questions (CFQ) dataset [28].
Dataset Splits	No	The paper mentions 'training and test distribution' for the CFQ dataset but does not explicitly provide specific train/validation/test splits (e.g., percentages or exact counts) for any of the datasets used in its experiments within the main text or the appendices.
Hardware Specification	No	The paper discusses efficiency and mentions leveraging 'the GPU to score the edges of the graph in parallel,' but it does not provide specific details about the models or specifications of the GPUs, CPUs, or other hardware used for running the experiments.
Software Dependencies	No	The paper mentions using a 'pre-trained BERT [18] model' and that the edge-level model uses a 'Graph Convolutional Network (GCN) with the same architecture as in [41]'. However, it does not specify version numbers for BERT, PyTorch, or any other software libraries or dependencies, which are necessary for full reproducibility.
Experiment Setup	Yes	More details about training strategies are given in Appendix D. ... Appendix D provides specific hyperparameters: 'We use the Adam optimizer [26], with a learning rate of 10^-5, a batch size of 32, a maximum sequence length of 10, a dropout rate of 0.1, and weight decay of 0.01 [31] for regularization.'