MeanSum: A Neural Model for Unsupervised Multi-Document Abstractive Summarization

Authors: Eric Chu, Peter Liu

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show through automated metrics and human evaluation that the generated summaries are highly abstractive, fluent, relevant, and representative of the average sentiment of the input reviews. Finally, we collect a reference evaluation dataset and show that our model outperforms a strong extractive baseline.
Researcher Affiliation Collaboration 1MIT Media Lab 2Google Brain.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The code is available online3. https://github.com/sosuperic/Mean Sum
Open Datasets Yes We tuned our models primarily on a dataset of customer reviews provided in the Yelp Dataset Challenge, where each review is accompanied by a 5-star rating. https://www.yelp.com/dataset/challenge
Dataset Splits Yes The final training, validation, and test splits consist of 10695, 1337, and 1337 businesses, and 1038184, 129856, and 129840 reviews, respectively.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only describes software architecture and training parameters.
Software Dependencies No The paper mentions general algorithms and models like "multiplicative LSTM" and "Adam" but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The language model, encoders, and decoders were multiplicative LSTM s (Krause et al., 2016) with 512 hidden units, a 0.1 dropout rate, a word embedding size of 256, and layer normalization (Ba et al., 2016). We used Adam (Kingma & Ba, 2014) to train, a learning rate of 0.001 for the language model, a learning rate of 0.0001 for the classifier, and a learning rate of 0.0005 for the summarization model, with β1 = 0.9 and β2 = 0.999. The initial temperature for the Gumbel-softmax was set to 2.0. One input item to the language model was k = 8 reviews from the same business or product concatenated together with end-of-review delimiters, with each update step operating on a subsequence of 256 subtokens. The review-rating classifier was a multi-channel text convolutional neural network similar to Kim (2014) with 3,4,5 width filters, 128 feature maps per filter, and a 0.5 dropout rate.