StepGame: A New Benchmark for Robust Multi-Hop Spatial Reasoning in Texts

Authors: Zhengxiang Shi, Qiang Zhang, Aldo Lipani11321-11329

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that state-of-the-art models on the b Ab I dataset struggle on the Step Game dataset. Moreover, we propose a Tensor-Product based Memory-Augmented Neural Network (TP-MANN) specialized for spatial reasoning tasks. Experimental results on both datasets show that our model outperforms all the baselines with superior generalization and robustness performance.
Researcher Affiliation Academia Zhengxiang Shi1, Qiang Zhang2, Aldo Lipani1 1University College London 2Zhejiang University
Pseudocode No The paper describes the model architecture and data generation process using text, diagrams (Figure 3), and mathematical formulas, but does not include explicit pseudocode or algorithm blocks.
Open Source Code Yes The software and data are available at: https://github.com/Zhengxiang Shi/Step Game
Open Datasets Yes In this paper, we present a new Question-Answering dataset called Step Game for robust multi-hop spatial reasoning in texts. ... The software and data are available at: https://github.com/Zhengxiang Shi/Step Game
Dataset Splits Yes For the b Ab I dataset we only focus on task 17 and task 19 and use the original train and test splits made of 10 000 samples for the training set and 1 000 for the validation and test sets. For the Step Game dataset, we generate a training set made of samples varying k from 1 to 5 at steps of 1, and a test set with k varying from 1 to 10. Moreover, the test set will also contain distracting noise. The final dataset consists of, for each k value, 10 000 training samples, 1 000 validation samples, and 10 000 test samples.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions 'The software and data are available at: https://github.com/Zhengxiang Shi/Step Game' but does not list specific software dependencies with version numbers in the text.
Experiment Setup No All training details, including those for our model, are reported in the Appendix.