Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
On the Trade-off between Redundancy and Cohesiveness in Extractive Summarization
Authors: Ronald Cardenas, Matthias Galle, Shay B. Cohen
JAIR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive automatic and human evaluations reveal that systems optimizing for among other properties cohesion are capable of better organizing content in summaries compared to systems that optimize only for redundancy, while maintaining comparable informativeness. |
| Researcher Affiliation | Collaboration | Ronald Cardenas EMAIL Institute for Language, Cognition and Computation School of Informatics, University of Edinburgh 10 Crichton Street, Edinburgh, UK Matthias Gall EMAIL Cohere 51 Great Marlborough St, London, UK Shay B. Cohen EMAIL Institute for Language, Cognition and Computation School of Informatics, University of Edinburgh 10 Crichton Street, Edinburgh, UK |
| Pseudocode | Yes | Algorithm 1 Kv D reading simulation. Subroutines get Proposition Tree, attach Propositions, memory Select and update Score are instantiated by Tree Kv D and Graph Kv D. |
| Open Source Code | Yes | 1. Code available at https://github.com/ronaldahmed/trade-off-kvd/ |
| Open Datasets | Yes | We used Pub Med and ar Xiv datasets (Cohan et al., 2018), consisting of scientific articles in English in the Biomedical and Computer Science, Physics domains, respectively. |
| Dataset Splits | No | The paper mentions using "training set of each dataset" (footnote 8), "validation sets" (section 5.2), and "test set of Pub Med" (section 5.4), implying dataset splits. However, it does not explicitly state the specific percentages or absolute counts for these splits, nor does it refer to specific predefined splits from a cited source with exact train/validation/test partitions needed for reproduction. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU models, memory specifications) used for running the experiments. It only vaguely mentions "computing resources provided by the University of Birmingham and EPCC at the University of Edinburgh" in the acknowledgements, without any technical specifications. |
| Software Dependencies | No | The paper mentions several software components and libraries, such as "Adam optimizer (Loshchilov & Hutter, 2019)", "RoBERTa model (Y. Liu et al., 2019)", "Gensim library (Rehurek & Sojka, 2010)", "Sci BERT (Beltagy, Lo, & Cohan, 2019)", and "Network X Python library". However, it does not provide specific version numbers for any of these software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | For redundancy-oriented model E.LG-MMRSel+, ...set λR = 0.6, γR = 0.99. For local coherence-oriented model E.LG-CCL, ...set it to λLC = 0.2. Both models were trained using Adam optimizer ..., batch size of 32, learning rate of 10 7, and trained for 20 epochs... For the proposed Kv D systems, we ...set the maximum recall path length R = 5, maximum tree persistence Ψ = 8, working memory capacity WM = 100 for both Tree Kv D and Graph Kv D. For proposition scoring in Graph Kv D, the decay factor is set to β = 0.01. |