Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
QUAREL: A Dataset and Models for Answering Questions about Qualitative Relationships
Authors: Oyvind Tafjord, Peter Clark, Matt Gardner, Wen-tau Yih, Ashish Sabharwal7063-7071
AAAI 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present QUAREL, a dataset of diverse story questions involving qualitative relationships that characterize these challenges, and techniques that begin to address them. ... We contribute ... (3) two novel models for this task, built as extensions of type-constrained semantic parsing. The ο¬rst of these models (called QUASP+) significantly outperforms off-the-shelf tools on QUAREL. The second (QUASP+ZERO) demonstrates zero-shot capability... The dataset and models are available at http://data.allenai.org/quarel. |
| Researcher Affiliation | Industry | Oyvind Tafjord, Peter Clark, Matt Gardner, Wen-tau Yih, Ashish Sabharwal Allen Institute for AI, Seattle, WA EMAIL |
| Pseudocode | No | The paper describes model architectures and processes, but it does not include a dedicated pseudocode block or algorithm labeled as such. |
| Open Source Code | Yes | The dataset and models are available at http://data.allenai.org/quarel. |
| Open Datasets | Yes | We present QUAREL, a dataset of diverse story questions involving qualitative relationships that characterize these challenges, and techniques that begin to address them. ... The dataset and models are available at http://data.allenai.org/quarel. |
| Dataset Splits | Yes | Table 1: Summary statistics for the QUAREL dataset. ... # questions train/dev/test 1941/278/552 |
| Hardware Specification | No | The paper describes the models and training process but does not specify any particular hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like Allen NLP, LSTMs, Glove, and ELMo, but does not provide specific version numbers for any of these dependencies. |
| Experiment Setup | No | The paper describes the training objective and general setup (e.g., using beam search and specific embeddings) but does not provide concrete hyperparameters such as learning rate, batch size, or number of epochs. |