LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning
Authors: Jian Liu, Leyang Cui, Hanmeng Liu, Dandan Huang, Yile Wang, Yue Zhang
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Results show that state-of-the-art neural models perform by far worse than human ceiling. Our dataset can also serve as a benchmark for reinvestigating logical AI under the deep learning NLP setting. Experimental results demonstrate a significant gap between machine (35.31% accuracy) and human ceiling performance (95.00%). |
| Researcher Affiliation | Academia | 1School of Computer Science, Fudan University 2School of Engineering, Westlake University 3Institute of Advanced Technology, Westlake Institute for Advanced Study |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks for its methods. |
| Open Source Code | No | The dataset is freely available at https://github.com/lgw863/LogiQA-dataset. However, the paper does not explicitly state that the source code for the methodology described (e.g., data cleaning, translation scripts, or evaluation scripts beyond using existing model implementations) is released. |
| Open Datasets | Yes | The dataset is freely available at https://github.com/lgw863/LogiQA-dataset. |
| Dataset Splits | Yes | We randomly split the dataset, using 80% for training, 10% for development and the remaining 10% for testing. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU or CPU models. |
| Software Dependencies | No | The paper mentions '100-dimensional Glove word embeddings are used as embedding initialization.' and 'For pre-trained methods, we follow the Hugging Face implementation [Wolf et al., 2019].' However, it does not provide specific version numbers for these or other software components. |
| Experiment Setup | No | The paper mentions '100-dimensional Glove word embeddings are used as embedding initialization.' and 'All hyper-parameters are decided by the model performance on the development sets.' However, it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations. |