reproducibilityindex.ai

Adaptive Beam Search Decoding for Discrete Keyphrase Generation

Authors: Xiaoli Huang, Tongge Xu, Lvan Jiao, Yueran Zu, Youmin Zhang13082-13089

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on ﬁve public datasets demonstrate the proposed model can generate marginally less duplicated and more accurate keyphrases.
Researcher Affiliation	Academia	Xiaoli Huang,1 Tongge Xu,2 Lvan Jiao,1 Yueran Zu,1 Youmin Zhang3 1 School of Computer Science and Engineering, Beihang University 2 School of Cyber Science and Technology, Beihang University 3 Jiangxi Research Institute of Beihang University
Pseudocode	No	The paper describes the model architecture and methods in text and with diagrams, but does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	The codes of Ada GM are available at: https://github.com/huangxiaolist/ada GM.
Open Datasets	Yes	Experiments are carried out on ﬁve scientiﬁc publication datasets, including KP20k (Meng et al. 2017), Inspec (Hulth 2003), Krapivin (Krapivin and Marchese 2009), NUS (Nguyen and Kan 2007), and Sem Eval (Kim et al. 2010).
Dataset Splits	Yes	After the two operations, the training, validation, and testing samples of the KP20k dataset are 509,818, 20,000, 20,000, respectively.
Hardware Specification	Yes	For a fair comparison, we use the same device (i.e., GTX-1080Ti)
Software Dependencies	No	The paper mentions using the Adam optimization algorithm and provides various model hyperparameters, but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	In the preprocessing stage, following (Yuan et al. 2018; Chan et al. 2019), for each document, we lowercase all characters, replace digits with a speciﬁc token <digit>, sort all the present keyphrase labels according to where they ﬁrst appear in the document and append absent keyphrases. We set the vocabulary as the most frequent 50,002 words and share it between the encoder and decoder. We set the dimension of word embedding to 100 and the hidden size of the encoder and decoder to 300. The word embedding is initialized using a uniform distribution within [ 0.1, 0.1 ]. The initial state of the decoder is initialized as the encoder s last time-step s hidden state. Dropout with a rate of 0.1 is applied to both the encoder and decoder states. During the training stage, we use the Adam optimization algorithm (Kingma and Ba 2014) with an initial learning rate of 0.001. The learning rate will be halved if the validation loss stops dropping. Early stopping is applied when validation loss stops decreasing for three contiguous checkpoints. We also set gradient clipping of 1.0, batch size of 32, and train our model for three epochs. During the test stage, we set beam-size as 20 and threshold α as 0.015. Moreover, we calculate F1@5 and F1@M after removing all the duplicated keyphrases.