Sentence Ordering and Coherence Modeling using Recurrent Neural Networks
Authors: Lajanugen Logeswaran, Honglak Lee, Dragomir Radev
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose an end-to-end unsupervised deep learning approach based on the set-to-sequence framework to address this problem. Our model strongly outperforms prior methods in the order discrimination task and a novel task of ordering abstracts from scientiļ¬c articles. Furthermore, our work shows that useful text representations can be obtained by learning to order sentences. Visualizing the learned sentence representations shows that the model captures high-level logical structure in paragraphs. Our representations perform comparably to state-of-the-art pre-training methods on sentence similarity and paraphrase detection tasks. |
| Researcher Affiliation | Academia | Lajanugen Logeswaran,1 Honglak Lee,1 Dragomir Radev2 1Department of Computer Science & Engineering, University of Michigan 2Department of Computer Science, Yale University llajan@umich.edu, honglak@eecs.umich.edu, dragomir.radev@yale.edu |
| Pseudocode | No | The paper describes the model architecture and equations but does not provide pseudocode or a clearly labeled algorithm block. |
| Open Source Code | No | The paper does not contain an unambiguous statement that the authors are releasing the source code for their described methodology. |
| Open Datasets | Yes | The datasets widely used for this task in previous work are the Accidents and Earthquakes news reports. In each of these datasets the training and test sets include 100 articles and approximately 20 permutations of each article. ... We use the following sources of abstracts for this task. NIPS Abstracts. ... ACL Abstracts. A second source of abstracts are papers from the ACL Anthology Network (AAN) corpus (Radev et al. 2009). ... NSF Abstracts. We also used the NSF Research Award Abstracts dataset (Lichman 2013). |
| Dataset Splits | Yes | The dataset was split into years 2005-2013 for training and 2014, 2015 respectively for validation, testing. ... We use all extracts of papers published up to year 2010 for training, year 2011 for validation and years 2012-2013 for testing. ... Years 1990-1999 were used for training, 2000 for validation and 2001-2003 for testing. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for its experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions using 'pretrained 300 dimensional Glo Ve word embeddings (Pennington, Socher, and Manning 2014)' and the 'Adam optimizer (Kingma and Ba 2014)'. It also references the 'Stanford parser (Klein and Manning 2003)' and 'Brown Coherence Toolkit'. However, it does not provide specific version numbers for any of these software dependencies or other key libraries/frameworks. |
| Experiment Setup | Yes | We used pretrained 300 dimensional Glo Ve word embeddings...All LSTMs use a hidden layer size of 1000 and the MLP in Eq. 8 has a hidden layer size of 500. The number of read cycles in the encoder is set to 10. ... We used the Adam optimizer... with batch size 10 and learning rate 5e-4 for learning. The model is regularized using early stopping. Hyperparameters were chosen using the validation set. ... All RNNs use a hidden layer size of 1000. For the window network we used a window size of 3 and a hidden layer size of 2000. |