Structure Learning for Headline Generation
Authors: Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, Xueqi Cheng9555-9562
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical studies show that our model can significantly outperform the stateof-the-art headline generation models. |
| Researcher Affiliation | Academia | CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide a specific repository link or an explicit statement about releasing the source code for their methodology. It mentions using TensorFlow but not providing their own implementation code. |
| Open Datasets | Yes | We evaluate our model on a public benchmark collection, i.e., the New York Times (NYT) Annotated corpus. The corpus contains over 1.8 million documents written and published by the New York Times between January 1, 1987 and June 19, 2007. |
| Dataset Splits | Yes | We randomly sample 2000 pairs to form the development and test set respectively, and the left pairs are used as the training data. |
| Hardware Specification | Yes | We run our model on a Tesla K80 GPU card |
| Software Dependencies | No | The paper states 'We implement our model in TensorFlow' but does not specify a version number for TensorFlow or any other software dependencies with their versions. |
| Experiment Setup | Yes | The dimension of word embeddings is 300, while the dimension of position embeddings is 200. We use one layer of bi-directional GRU for word encoder and another uni-directional GRU for decoder. We use three GCN hidden layers. The hidden unit size in the word encoder, word decoder and GCN is 300. The pooling parameter k is set as 12. The learning rate of Adam (Kingma and Ba 2015) is set as 0.0005. All trainable parameters are initialized in the range [ 0.1, 0.1]. For training, we use a mini-batch size of 64 and documents with similar length (in terms of the number of sentences) are organized to be a batch. Dropout with probability 0.2 is applied between vertical GRU stacks and gradient clipping is adopted by scaling gradients when the norm exceeded a threshold of 5. |