reproducibilityindex.ai

MeanSum: A Neural Model for Unsupervised Multi-Document Abstractive Summarization

Authors: Eric Chu, Peter Liu

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show through automated metrics and human evaluation that the generated summaries are highly abstractive, ﬂuent, relevant, and representative of the average sentiment of the input reviews. Finally, we collect a reference evaluation dataset and show that our model outperforms a strong extractive baseline.
Researcher Affiliation	Collaboration	1MIT Media Lab 2Google Brain.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available online3. https://github.com/sosuperic/Mean Sum
Open Datasets	Yes	We tuned our models primarily on a dataset of customer reviews provided in the Yelp Dataset Challenge, where each review is accompanied by a 5-star rating. https://www.yelp.com/dataset/challenge
Dataset Splits	Yes	The ﬁnal training, validation, and test splits consist of 10695, 1337, and 1337 businesses, and 1038184, 129856, and 129840 reviews, respectively.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only describes software architecture and training parameters.
Software Dependencies	No	The paper mentions general algorithms and models like "multiplicative LSTM" and "Adam" but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The language model, encoders, and decoders were multiplicative LSTM s (Krause et al., 2016) with 512 hidden units, a 0.1 dropout rate, a word embedding size of 256, and layer normalization (Ba et al., 2016). We used Adam (Kingma & Ba, 2014) to train, a learning rate of 0.001 for the language model, a learning rate of 0.0001 for the classiﬁer, and a learning rate of 0.0005 for the summarization model, with β1 = 0.9 and β2 = 0.999. The initial temperature for the Gumbel-softmax was set to 2.0. One input item to the language model was k = 8 reviews from the same business or product concatenated together with end-of-review delimiters, with each update step operating on a subsequence of 256 subtokens. The review-rating classiﬁer was a multi-channel text convolutional neural network similar to Kim (2014) with 3,4,5 width ﬁlters, 128 feature maps per ﬁlter, and a 0.5 dropout rate.