reproducibilityindex.ai

Diagnostics-Guided Explanation Generation

Authors: Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein10445-10453

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform experiments on three datasets from the ERASER benchmark (De Young et al. 2020a) (FEVER, Multi RC, Movies)...
Researcher Affiliation	Academia	Pepa Atanasova , Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein Department of Computer Science, University of Copenhagen, Denmark {pepa, simonsen, c.lioma, augenstein}@di.ku.dk
Pseudocode	No	The paper describes its methods in prose, detailing steps and components, but it does not include formal pseudocode blocks or algorithm listings.
Open Source Code	Yes	1We make an extended version of the manuscript and code available on https://github.com/copenlu/diagnostic-guidedexplanations .
Open Datasets	Yes	We perform experiments on three datasets from the ERASER benchmark (De Young et al. 2020a) (FEVER, Multi RC, Movies), all of which require complex reasoning and have sentence-level rationales.
Dataset Splits	No	The paper uses standard benchmark datasets but does not explicitly provide specific percentages, sample counts, or citations for how training, validation, and test splits were performed.
Hardware Specification	No	The paper mentions using 'BERT (Devlin et al. 2019) base-uncased as our base architecture' but does not specify any hardware details (e.g., GPU/CPU models, memory, cloud instance types) used for running the experiments.
Software Dependencies	No	The paper mentions key software components like 'Transformer' and 'BERT base-uncased', but it does not provide specific version numbers for these or any other ancillary software dependencies.
Experiment Setup	No	The paper describes the model and training objectives, noting the use of hyperparameters like λ (for sparsity penalty) and K (for word masking), but it does not provide specific numerical values for these or other typical experimental setup details such as learning rate, batch size, or number of epochs.