StepGame: A New Benchmark for Robust Multi-Hop Spatial Reasoning in Texts
Authors: Zhengxiang Shi, Qiang Zhang, Aldo Lipani11321-11329
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that state-of-the-art models on the b Ab I dataset struggle on the Step Game dataset. Moreover, we propose a Tensor-Product based Memory-Augmented Neural Network (TP-MANN) specialized for spatial reasoning tasks. Experimental results on both datasets show that our model outperforms all the baselines with superior generalization and robustness performance. |
| Researcher Affiliation | Academia | Zhengxiang Shi1, Qiang Zhang2, Aldo Lipani1 1University College London 2Zhejiang University |
| Pseudocode | No | The paper describes the model architecture and data generation process using text, diagrams (Figure 3), and mathematical formulas, but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | The software and data are available at: https://github.com/Zhengxiang Shi/Step Game |
| Open Datasets | Yes | In this paper, we present a new Question-Answering dataset called Step Game for robust multi-hop spatial reasoning in texts. ... The software and data are available at: https://github.com/Zhengxiang Shi/Step Game |
| Dataset Splits | Yes | For the b Ab I dataset we only focus on task 17 and task 19 and use the original train and test splits made of 10 000 samples for the training set and 1 000 for the validation and test sets. For the Step Game dataset, we generate a training set made of samples varying k from 1 to 5 at steps of 1, and a test set with k varying from 1 to 10. Moreover, the test set will also contain distracting noise. The final dataset consists of, for each k value, 10 000 training samples, 1 000 validation samples, and 10 000 test samples. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'The software and data are available at: https://github.com/Zhengxiang Shi/Step Game' but does not list specific software dependencies with version numbers in the text. |
| Experiment Setup | No | All training details, including those for our model, are reported in the Appendix. |