reproducibilityindex.ai

Title-Guided Encoding for Keyphrase Generation

Authors: Wang Chen, Yifan Gao, Jiani Zhang, Irwin King, Michael R. Lyu6268-6275

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on a range of KG datasets demonstrate that our model outperforms the state-ofthe-art models with a large margin, especially for documents with either very low or very high title length ratios. The overall empirical results on ﬁve real-world benchmarks show that our model outperforms the state-ofthe-art models signiﬁcantly on both present and absent keyphrase prediction, especially for documents with either very low or very high title length ratios.
Researcher Affiliation	Academia	1Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong 2Shenzhen Key Laboratory of Rich Media Big Data Analytics and Application, Shenzhen Research Institute, The Chinese University of Hong Kong, Shenzhen, China
Pseudocode	No	The paper describes the model architecture and components using equations and text, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	The paper states, 'We implement the models using Py Torch (Paszke et al. 2017) on the basis of the Open NMT-py system (Klein et al. 2017),' which refers to using existing open-source frameworks, but no explicit statement or link is provided for the authors' own implementation code for the described methodology.
Open Datasets	Yes	For all the generative models (i.e. our TG-Net model as well as all the encoder-decoder baselines), we choose the largest publicly available keyphrase generation dataset KP20k constructed by Meng et al. (2017) as the training dataset. Besides KP20k, we also adopt other four widely-used scientiﬁc datasets for comprehensive testing, including Inspec (Hulth 2003), Krapivin (Krapivin, Autaeu, and Marchese 2009), NUS (Nguyen and Kan 2007), and Sem Eval2010 (Kim et al. 2010).
Dataset Splits	Yes	Totally 567,830 articles are collected in this dataset, where 527,830 for training, 20,000 for validation, and 20,000 for testing. Table 2: The statistics of testing datasets. The Training means the training part for the traditional supervised extractive baseline. The FFCV represents ﬁve-fold cross validation. The Testing means the testing part for all models.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments. It does not state what hardware was used beyond the software frameworks.
Software Dependencies	No	The paper mentions software like 'Py Torch (Paszke et al. 2017)', 'Open NMT-py system (Klein et al. 2017)', and 'Core NLP (Manning et al. 2014)' but does not provide specific version numbers for these dependencies, which are required for reproducibility.
Experiment Setup	Yes	We set the embedding dimension de to 100, the hidden size d to 256, and λ to 0.5. All the initial states of GRU cells are set as zero vectors except that h0 is initialized as [ m Lx; m1]. We share the embedding matrix among the context words, the title words, and the target keyphrase words. All the trainable variables including the embedding matrix are initialized randomly with uniform distribution in [-0.1, 0.1]. The model is optimized by Adam (Kingma and Ba 2015) with batch size = 64, initial learning rate = 0.001, gradient clipping = 1, and dropout rate = 0.1. We decay the learning rate into the half when the evaluation perplexity stops dropping. Early stopping is applied when the validation perplexity stops dropping for three continuous evaluations. During testing, we set the maximum depth of beam search as 6 and the beam size as 200.