Title-Guided Encoding for Keyphrase Generation
Authors: Wang Chen, Yifan Gao, Jiani Zhang, Irwin King, Michael R. Lyu6268-6275
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on a range of KG datasets demonstrate that our model outperforms the state-ofthe-art models with a large margin, especially for documents with either very low or very high title length ratios. The overall empirical results on five real-world benchmarks show that our model outperforms the state-ofthe-art models significantly on both present and absent keyphrase prediction, especially for documents with either very low or very high title length ratios. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong 2Shenzhen Key Laboratory of Rich Media Big Data Analytics and Application, Shenzhen Research Institute, The Chinese University of Hong Kong, Shenzhen, China |
| Pseudocode | No | The paper describes the model architecture and components using equations and text, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper states, 'We implement the models using Py Torch (Paszke et al. 2017) on the basis of the Open NMT-py system (Klein et al. 2017),' which refers to using existing open-source frameworks, but no explicit statement or link is provided for the authors' own implementation code for the described methodology. |
| Open Datasets | Yes | For all the generative models (i.e. our TG-Net model as well as all the encoder-decoder baselines), we choose the largest publicly available keyphrase generation dataset KP20k constructed by Meng et al. (2017) as the training dataset. Besides KP20k, we also adopt other four widely-used scientific datasets for comprehensive testing, including Inspec (Hulth 2003), Krapivin (Krapivin, Autaeu, and Marchese 2009), NUS (Nguyen and Kan 2007), and Sem Eval2010 (Kim et al. 2010). |
| Dataset Splits | Yes | Totally 567,830 articles are collected in this dataset, where 527,830 for training, 20,000 for validation, and 20,000 for testing. Table 2: The statistics of testing datasets. The Training means the training part for the traditional supervised extractive baseline. The FFCV represents five-fold cross validation. The Testing means the testing part for all models. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments. It does not state what hardware was used beyond the software frameworks. |
| Software Dependencies | No | The paper mentions software like 'Py Torch (Paszke et al. 2017)', 'Open NMT-py system (Klein et al. 2017)', and 'Core NLP (Manning et al. 2014)' but does not provide specific version numbers for these dependencies, which are required for reproducibility. |
| Experiment Setup | Yes | We set the embedding dimension de to 100, the hidden size d to 256, and λ to 0.5. All the initial states of GRU cells are set as zero vectors except that h0 is initialized as [ m Lx; m1]. We share the embedding matrix among the context words, the title words, and the target keyphrase words. All the trainable variables including the embedding matrix are initialized randomly with uniform distribution in [-0.1, 0.1]. The model is optimized by Adam (Kingma and Ba 2015) with batch size = 64, initial learning rate = 0.001, gradient clipping = 1, and dropout rate = 0.1. We decay the learning rate into the half when the evaluation perplexity stops dropping. Early stopping is applied when the validation perplexity stops dropping for three continuous evaluations. During testing, we set the maximum depth of beam search as 6 and the beam size as 200. |