Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

On the Trade-off between Redundancy and Cohesiveness in Extractive Summarization

Authors: Ronald Cardenas, Matthias Galle, Shay B. Cohen

JAIR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive automatic and human evaluations reveal that systems optimizing for among other properties cohesion are capable of better organizing content in summaries compared to systems that optimize only for redundancy, while maintaining comparable informativeness.
Researcher Affiliation	Collaboration	Ronald Cardenas EMAIL Institute for Language, Cognition and Computation School of Informatics, University of Edinburgh 10 Crichton Street, Edinburgh, UK Matthias Gall EMAIL Cohere 51 Great Marlborough St, London, UK Shay B. Cohen EMAIL Institute for Language, Cognition and Computation School of Informatics, University of Edinburgh 10 Crichton Street, Edinburgh, UK
Pseudocode	Yes	Algorithm 1 Kv D reading simulation. Subroutines get Proposition Tree, attach Propositions, memory Select and update Score are instantiated by Tree Kv D and Graph Kv D.
Open Source Code	Yes	1. Code available at https://github.com/ronaldahmed/trade-off-kvd/
Open Datasets	Yes	We used Pub Med and ar Xiv datasets (Cohan et al., 2018), consisting of scientific articles in English in the Biomedical and Computer Science, Physics domains, respectively.
Dataset Splits	No	The paper mentions using "training set of each dataset" (footnote 8), "validation sets" (section 5.2), and "test set of Pub Med" (section 5.4), implying dataset splits. However, it does not explicitly state the specific percentages or absolute counts for these splits, nor does it refer to specific predefined splits from a cited source with exact train/validation/test partitions needed for reproduction.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU models, memory specifications) used for running the experiments. It only vaguely mentions "computing resources provided by the University of Birmingham and EPCC at the University of Edinburgh" in the acknowledgements, without any technical specifications.
Software Dependencies	No	The paper mentions several software components and libraries, such as "Adam optimizer (Loshchilov & Hutter, 2019)", "RoBERTa model (Y. Liu et al., 2019)", "Gensim library (Rehurek & Sojka, 2010)", "Sci BERT (Beltagy, Lo, & Cohan, 2019)", and "Network X Python library". However, it does not provide specific version numbers for any of these software dependencies, which is required for reproducibility.
Experiment Setup	Yes	For redundancy-oriented model E.LG-MMRSel+, ...set λR = 0.6, γR = 0.99. For local coherence-oriented model E.LG-CCL, ...set it to λLC = 0.2. Both models were trained using Adam optimizer ..., batch size of 32, learning rate of 10 7, and trained for 20 epochs... For the proposed Kv D systems, we ...set the maximum recall path length R = 5, maximum tree persistence Ψ = 8, working memory capacity WM = 100 for both Tree Kv D and Graph Kv D. For proposition scoring in Graph Kv D, the decay factor is set to β = 0.01.