Adaptive Beam Search Decoding for Discrete Keyphrase Generation
Authors: Xiaoli Huang, Tongge Xu, Lvan Jiao, Yueran Zu, Youmin Zhang13082-13089
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on five public datasets demonstrate the proposed model can generate marginally less duplicated and more accurate keyphrases. |
| Researcher Affiliation | Academia | Xiaoli Huang,1 Tongge Xu,2 Lvan Jiao,1 Yueran Zu,1 Youmin Zhang3 1 School of Computer Science and Engineering, Beihang University 2 School of Cyber Science and Technology, Beihang University 3 Jiangxi Research Institute of Beihang University |
| Pseudocode | No | The paper describes the model architecture and methods in text and with diagrams, but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The codes of Ada GM are available at: https://github.com/huangxiaolist/ada GM. |
| Open Datasets | Yes | Experiments are carried out on five scientific publication datasets, including KP20k (Meng et al. 2017), Inspec (Hulth 2003), Krapivin (Krapivin and Marchese 2009), NUS (Nguyen and Kan 2007), and Sem Eval (Kim et al. 2010). |
| Dataset Splits | Yes | After the two operations, the training, validation, and testing samples of the KP20k dataset are 509,818, 20,000, 20,000, respectively. |
| Hardware Specification | Yes | For a fair comparison, we use the same device (i.e., GTX-1080Ti) |
| Software Dependencies | No | The paper mentions using the Adam optimization algorithm and provides various model hyperparameters, but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | In the preprocessing stage, following (Yuan et al. 2018; Chan et al. 2019), for each document, we lowercase all characters, replace digits with a specific token <digit>, sort all the present keyphrase labels according to where they first appear in the document and append absent keyphrases. We set the vocabulary as the most frequent 50,002 words and share it between the encoder and decoder. We set the dimension of word embedding to 100 and the hidden size of the encoder and decoder to 300. The word embedding is initialized using a uniform distribution within [ 0.1, 0.1 ]. The initial state of the decoder is initialized as the encoder s last time-step s hidden state. Dropout with a rate of 0.1 is applied to both the encoder and decoder states. During the training stage, we use the Adam optimization algorithm (Kingma and Ba 2014) with an initial learning rate of 0.001. The learning rate will be halved if the validation loss stops dropping. Early stopping is applied when validation loss stops decreasing for three contiguous checkpoints. We also set gradient clipping of 1.0, batch size of 32, and train our model for three epochs. During the test stage, we set beam-size as 20 and threshold α as 0.015. Moreover, we calculate F1@5 and F1@M after removing all the duplicated keyphrases. |