Towards Personalized Review Summarization via User-Aware Sequence Network

Authors: Junjie Li, Haoran Li, Chengqing Zong6690-6697

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To validate our model, we collected a new dataset Trip, comprising 536,255 reviews from 19,400 users. With quantitative and human evaluation, we show that USN achieves state-of-the-art performance on personalized review summarization.
Researcher Affiliation Academia Junjie Li,1,2 Haoran Li,1,2 Chengqing Zong1,2,3 1National Laboratory of Pattern Recognition, Institute of Automation, CAS, Beijing, China 2University of Chinese Academy of Sciences, Beijing, China 3CAS Center for Excellence in Brain Science and Intelligence Technology, Beijing, China {junjie.li, haoran.li, cqzong}@nlpr.ia.ac.cn
Pseudocode No The paper provides mathematical equations for its model but does not include any pseudocode or algorithm blocks.
Open Source Code No The paper states: 'For evaluation of personalized review summarization, we introduce a novel dataset named Trip, which is available at https://github.com/Junjieli0704/USN.' This link is explicitly for the dataset, not the source code for the methodology described in the paper.
Open Datasets Yes To validate our approach, we collect a new personalized review summarization dataset named Trip from Tripadvisor website, which contains 536,255 review-summary pairs with 19,400 users. ... For evaluation of personalized review summarization, we introduce a novel dataset named Trip, which is available at https://github.com/Junjieli0704/USN.
Dataset Splits Yes We randomly split the dataset into 5,000 reviews for test, 5,000 reviews for validation and the rest for training.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU specifications) used for running the experiments.
Software Dependencies No The paper mentions 'Adam as our optimizing algorithm' and 'Pyrouge package 3' but does not specify version numbers for other key software components or libraries (e.g., deep learning frameworks like TensorFlow or PyTorch, or Python version) used to implement the model.
Experiment Setup Yes For all experiments, we set the word embedding size and user embedding size to 128, and all LSTM hidden state sizes to 256. We use dropout (Srivastava et al. 2014) with probability p = 0.2. During training, we use loss on the validation set to implement early stopping and also apply gradient clipping (Pascanu, Mikolov, and Bengio 2013) with range [ 5, 5]. At test time, our summaries are produced using beam search with beam size 5. We use Adam as our optimizing algorithm. We set the batch size to 128. We use a vocabulary of 30,000 words for both source and target. We truncate the review to 200 tokens... We use the develop set to choose the size of user-specific vocabulary and set it to 200.