Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

QUAREL: A Dataset and Models for Answering Questions about Qualitative Relationships

Authors: Oyvind Tafjord, Peter Clark, Matt Gardner, Wen-tau Yih, Ashish Sabharwal7063-7071

AAAI 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present QUAREL, a dataset of diverse story questions involving qualitative relationships that characterize these challenges, and techniques that begin to address them. ... We contribute ... (3) two novel models for this task, built as extensions of type-constrained semantic parsing. The ﬁrst of these models (called QUASP+) significantly outperforms off-the-shelf tools on QUAREL. The second (QUASP+ZERO) demonstrates zero-shot capability... The dataset and models are available at http://data.allenai.org/quarel.
Researcher Affiliation	Industry	Oyvind Tafjord, Peter Clark, Matt Gardner, Wen-tau Yih, Ashish Sabharwal Allen Institute for AI, Seattle, WA EMAIL
Pseudocode	No	The paper describes model architectures and processes, but it does not include a dedicated pseudocode block or algorithm labeled as such.
Open Source Code	Yes	The dataset and models are available at http://data.allenai.org/quarel.
Open Datasets	Yes	We present QUAREL, a dataset of diverse story questions involving qualitative relationships that characterize these challenges, and techniques that begin to address them. ... The dataset and models are available at http://data.allenai.org/quarel.
Dataset Splits	Yes	Table 1: Summary statistics for the QUAREL dataset. ... # questions train/dev/test 1941/278/552
Hardware Specification	No	The paper describes the models and training process but does not specify any particular hardware used for running the experiments.
Software Dependencies	No	The paper mentions software components like Allen NLP, LSTMs, Glove, and ELMo, but does not provide specific version numbers for any of these dependencies.
Experiment Setup	No	The paper describes the training objective and general setup (e.g., using beam search and specific embeddings) but does not provide concrete hyperparameters such as learning rate, batch size, or number of epochs.