Structured Neural Summarization

Authors: Patrick Fernandes, Miltiadis Allamanis, Marc Brockschmidt

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In an extensive evaluation, we show that the resulting hybrid sequence-graph models outperform both pure sequence models as well as pure graph models on a range of summarization tasks.
Researcher Affiliation Industry Patrick Fernandes, Miltiadis Allamanis & Marc Brockschmidt Microsoft Research Cambridge, United Kingdom {t-pafern,miallama,mabrocks}@microsoft.com
Pseudocode No The paper describes the model architecture and mathematical formulations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes We release all used code and data at https://github.com/Coder Pat/structured-neural-summarization.
Open Datasets Yes We consider the Java (small) dataset of Alon et al. (2018a), re-using the train-validation-test splits they have picked. We additionally generated a new dataset from 23 open-source C# projects mined from Git Hub... We use the CNN/DM dataset (Hermann et al., 2015) using the exact data and split provided by See et al. (2017).
Dataset Splits Yes First, we consider the Java (small) dataset of Alon et al. (2018a), re-using the train-validation-test splits they have picked. The C# dataset is split 85-5-10%.
Hardware Specification No The paper mentions 'efficient computation' and 'TensorFlow's unsorted segment * operations' but does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for experiments.
Software Dependencies Yes We use Stanford Core NLP (Manning et al., 2014) (version 3.9.1) to tokenize the text and provide the resulting tokens to the encoder.
Experiment Setup Yes Concretely, we combine two encoders (a bidirectional LSTM encoder with 1 layer and 256 hidden units, and its sequence GNN extension with 128 hidden units unrolled over 8 timesteps) with two decoders (an LSTM decoder with 1 layer and 256 hidden units with attention over the input sequence, and an extension using a pointer network-style copying mechanism (Vinyals et al., 2015a)).