ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning
Authors: Weihao Yu, Zihang Jiang, Yanfei Dong, Jiashi Feng
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that state-of-the-art models have an outstanding ability to capture biases contained in the dataset with high accuracy on EASY set. However, they struggle on HARD set with poor performance near that of random guess, indicating more research is needed to essentially enhance the logical reasoning ability of current models. |
| Researcher Affiliation | Academia | Weihao Yu , Zihang Jiang , Yanfei Dong & Jiashi Feng National University of Singapore weihaoyu6@gmail.com, {jzihang, dyanfei}@u.nus.edu, elefjia@nus.edu.sg |
| Pseudocode | No | The paper describes algorithms and models, but it does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper provides a 'Project page' link (http://whyu.me/reclor/) but does not explicitly state that the source code for the described methodology is available there, nor does it provide a direct repository link. |
| Open Datasets | Yes | Re Clor is available for non-commercial research purpose only. We are also hosting a public evaluation server on Eval AI (Yadav et al., 2019) to benchmark progress on Reclor. |
| Dataset Splits | Yes | They are divided into training set, validation set and testing set with 4,638, 500 and 1,000 data points respectively. |
| Hardware Specification | No | The paper mentions 'computational resources' supported by a program, but it does not provide specific hardware details such as GPU or CPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions software like 'fast Text python library', 'Bi-LSTM', 'GloVe word embedding', 'Transformers of Hugging Face', and 'Adam', but it does not provide specific version numbers for these components. |
| Experiment Setup | Yes | We use a batch size of 24 and fine-tune for 10 epochs. The maximum input sequence length for all models is 256. The detailed hyperparameters are shown in Table 9. |