Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion
Authors: Bingning Wang, Ting Yao, Qi Zhang, Jingfang Xu, Xiaochuan Wang9146-9153
AAAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Current QA models that perform very well on many question answering problems, such as BERT (Devlin et al. 2018), only achieves 77% accuracy on this dataset, a large margin behind humans nearly 92% performance, indicating Re CO present a good challenge for machine reading comprehension. |
| Researcher Affiliation | Industry | Bingning Wang, Ting Yao, Qi Zhang, Jingfang Xu, Xiaochuan Wang Sogou Inc. Beijing, 100084, China EMAIL |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The codes, dataset and leaderboard will be freely available at https://github.com/benywon/Re CO. |
| Open Datasets | Yes | The codes, dataset and leaderboard will be freely available at https://github.com/benywon/Re CO. |
| Dataset Splits | No | Finally, we obtain 280,000 training data and 20,000 testing data. The paper specifies training and testing data but does not explicitly mention a separate validation set split. |
| Hardware Specification | Yes | In all experiments we set the batch size to 48 and run on 8 Nvidia V100 GPUs. |
| Software Dependencies | No | The paper mentions software like 'sentencepiece', 'Bi DAF', 'BERT', and 'ELMO' but does not provide specific version numbers for these components or any other ancillary software dependencies. |
| Experiment Setup | Yes | In all experiments we set the batch size to 48 and run on 8 Nvidia V100 GPUs. |