Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Compressed Heterogeneous Graph for Abstractive Multi-Document Summarization
Authors: Miao Li, Jianzhong Qi, Jey Han Lau
AAAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results over MULTI-NEWS, WCEP-100, and ARXIV show that HGSUM outperforms state-of-the-art MDS models. We test our proposed model HGSUM and compare it against state-of-the-art abstractive MDS models over several datasets. We also report the results of an ablation study to show the effectiveness of the components of HGSUM. |
| Researcher Affiliation | Academia | Miao Li, Jianzhong Qi, Jey Han Lau School of Computing and Information Systems, The University of Melbourne EMAIL, EMAIL |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | The code for our model and experiments is available at: https://github.com/oaimli/HGSum. |
| Open Datasets | Yes | We use MULTI-NEWS (Fabbri et al. 2019), WCEP-100 (Ghalandari et al. 2020), and ARXIV (Cohan et al. 2018) as benchmark English datasets. |
| Dataset Splits | No | The paper mentions tuning hyperparameters based on a "development set" but does not specify the train/validation/test dataset splits (e.g., percentages or sample counts) needed for reproduction. For example: "All other hyper-parameters are tuned based on the development set." |
| Hardware Specification | Yes | All experiments are run on Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz with NVIDIA Tesla A100 GPU (40G). |
| Software Dependencies | No | The paper mentions using the "Hugging Face library" but does not provide specific version numbers for any software dependencies, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow, etc.). |
| Experiment Setup | Yes | The hyper-parameter β is set to 0.5 to balance two loss functions. All other hyper-parameters are tuned based on the development set. We use beam search decoding with beam width 5 to generate the summary. To alleviate overfitting, we apply label smoothing during training with a smoothing factor of 0.1. |