reproducibilityindex.ai

LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning

Authors: Jian Liu, Leyang Cui, Hanmeng Liu, Dandan Huang, Yile Wang, Yue Zhang

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Results show that state-of-the-art neural models perform by far worse than human ceiling. Our dataset can also serve as a benchmark for reinvestigating logical AI under the deep learning NLP setting. Experimental results demonstrate a significant gap between machine (35.31% accuracy) and human ceiling performance (95.00%).
Researcher Affiliation	Academia	1School of Computer Science, Fudan University 2School of Engineering, Westlake University 3Institute of Advanced Technology, Westlake Institute for Advanced Study
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks for its methods.
Open Source Code	No	The dataset is freely available at https://github.com/lgw863/LogiQA-dataset. However, the paper does not explicitly state that the source code for the methodology described (e.g., data cleaning, translation scripts, or evaluation scripts beyond using existing model implementations) is released.
Open Datasets	Yes	The dataset is freely available at https://github.com/lgw863/LogiQA-dataset.
Dataset Splits	Yes	We randomly split the dataset, using 80% for training, 10% for development and the remaining 10% for testing.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU or CPU models.
Software Dependencies	No	The paper mentions '100-dimensional Glove word embeddings are used as embedding initialization.' and 'For pre-trained methods, we follow the Hugging Face implementation [Wolf et al., 2019].' However, it does not provide specific version numbers for these or other software components.
Experiment Setup	No	The paper mentions '100-dimensional Glove word embeddings are used as embedding initialization.' and 'All hyper-parameters are decided by the model performance on the development sets.' However, it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations.