reproducibilityindex.ai

Global-aware Beam Search for Neural Abstractive Summarization

Authors: Ye Ma, Zixun Lan, Lu Zong, Kaizhu Huang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on nine datasets show that the global (attention)-aware inference signiﬁcantly improves state-of-the-art summarization models even using empirical hyper-parameters. The algorithm is also proven robust as it remains to generate meaningful texts with corrupted attention distributions.
Researcher Affiliation	Academia	Ye Ma1,4 Zixun Lan2,4 Lu Zong1,4 Kaizhu Huang3 1 Department of Financial and Actuarial Mathematics, School of Science 2 Department of Applied Mathematics, School of Science 3 Department of Intelligent Science, School of Advanced Technology 4 Laboratory for Intelligent Computing and Financial Technology Xi an Jiaotong-Liverpool University, SIP, 215123 Suzhou, China
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks clearly labeled as such.
Open Source Code	Yes	The codes and a comprehensive set of examples are available.2 ... 2https://github.com/yema2018/global_aware
Open Datasets	Yes	We evaluate the performance on totally 9 summarization datasets, where 2 datasets (CNN/DM [10], XSUM [5]) with BART [17] and 8 datasets (XSUM [5], Bill Sum [15], Multi-News [6], News Room [9], Wiki How [16], Reddit TIFU [34], ar Xiv and Pub Med [2]) with PEGASUS [39].
Dataset Splits	No	The attention-prediction model is trained on the training set for about 50,000 steps, and checkpoints are saved per 10,000 steps to select the best checkpoints on the development set. ... According to the numerical tests on the development set, we ﬁnally choose β = 12, γ = 1.5 for CNN/DM and β = 4, γ = 0 for XSUM.
Hardware Specification	Yes	All experiments are conducted on 3 RTX 6000.
Software Dependencies	No	The paper mentions software like "Adabelief-optimizer", "Hugging Face transformers", and "ROUGE code" but does not specify their version numbers. For example, it says "The optimizer is the Adabelief-optimizer [41] with eps 1e 16, betas (0.9, 0.999), weight decay 1e 4 and learning rate 2e 5." which are parameters, not version numbers.
Experiment Setup	Yes	We adopt a randomly initialized 2-layer transformer-encoder in the attention-prediction model wherethe structure of each layer is the same as the BART-encoder layer. The optimizer is the Adabelief-optimizer [41] with eps 1e 16, betas (0.9, 0.999), weight decay 1e 4 and learning rate 2e 5. The attention-prediction model is trained on the training set for about 50,000 steps, and checkpoints are saved per 10,000 steps to select the best checkpoints on the development set. ... The searching scopes of β and γ are in {2, 4, 6, 10, 12, 15, 18, 20} and {0, 0.5, 1, 1.5, 2}, respectively. According to the numerical tests on the development set, we ﬁnally choose β = 12, γ = 1.5 for CNN/DM and β = 4, γ = 0 for XSUM. As limited improvement could be observed from larger γ s, we recommend γ = 1 for normal or longer targets. When testing the global-aware inference with PEGASUS [39], we directly use empirical hyper-parameters for each dataset, namely β = 4, γ = 0 for XSUM and β = 12, γ = 1 for other 7 datasets. The beam size K follows the setups in BART [17] and PEGASUS [39].