WinoGrande: An Adversarial Winograd Schema Challenge at Scale
Authors: Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, Yejin Choi8732-8740
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The best state-of-the-art methods on WINOGRANDE achieve 59.4 79.1%, which are 15-35% (absolute) below human performance of 94.0%, depending on the amount of the training data allowed (2% 100% respectively). Furthermore, we establish new state-of-the-art results on five related benchmarks WSC ( 90.1%), DPR ( 93.1%), COPA( 90.6%), Know Ref ( 85.6%), and Winogender ( 97.1%). |
| Researcher Affiliation | Collaboration | Allen Institute for Artificial Intelligence, University of Washington {keisukes, ronanlb, chandrab, yejinc}@allenai.org |
| Pseudocode | Yes | Algorithm 1: AFLITE |
| Open Source Code | Yes | Our datasets, crowdsourcing interface, and models are available at http://winogrande.allenai.org. |
| Open Datasets | Yes | To investigate this question, we introduce WINOGRANDE, a large-scale dataset of 44k problems... Our datasets, crowdsourcing interface, and models are available at http://winogrande.allenai.org. |
| Dataset Splits | Yes | Concretely, we use 6k instances (5k for training and 1k for validation) from the dataset (containing 53k instances in total) to fine-tune Ro BERTa (referred to as Ro BERTaembed). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using RoBERTa and BERT, and fairseq in a footnote, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Concretely, we use 6k instances (5k for training and 1k for validation) from the dataset (containing 53k instances in total) to fine-tune Ro BERTa (referred to as Ro BERTaembed). ... When applying AFLITE to WINOGRANDE, we set m = 10, 000, n = 64, k = 500, and τ = 0.75. |