Towards Personalized Review Summarization via User-Aware Sequence Network
Authors: Junjie Li, Haoran Li, Chengqing Zong6690-6697
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate our model, we collected a new dataset Trip, comprising 536,255 reviews from 19,400 users. With quantitative and human evaluation, we show that USN achieves state-of-the-art performance on personalized review summarization. |
| Researcher Affiliation | Academia | Junjie Li,1,2 Haoran Li,1,2 Chengqing Zong1,2,3 1National Laboratory of Pattern Recognition, Institute of Automation, CAS, Beijing, China 2University of Chinese Academy of Sciences, Beijing, China 3CAS Center for Excellence in Brain Science and Intelligence Technology, Beijing, China {junjie.li, haoran.li, cqzong}@nlpr.ia.ac.cn |
| Pseudocode | No | The paper provides mathematical equations for its model but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: 'For evaluation of personalized review summarization, we introduce a novel dataset named Trip, which is available at https://github.com/Junjieli0704/USN.' This link is explicitly for the dataset, not the source code for the methodology described in the paper. |
| Open Datasets | Yes | To validate our approach, we collect a new personalized review summarization dataset named Trip from Tripadvisor website, which contains 536,255 review-summary pairs with 19,400 users. ... For evaluation of personalized review summarization, we introduce a novel dataset named Trip, which is available at https://github.com/Junjieli0704/USN. |
| Dataset Splits | Yes | We randomly split the dataset into 5,000 reviews for test, 5,000 reviews for validation and the rest for training. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Adam as our optimizing algorithm' and 'Pyrouge package 3' but does not specify version numbers for other key software components or libraries (e.g., deep learning frameworks like TensorFlow or PyTorch, or Python version) used to implement the model. |
| Experiment Setup | Yes | For all experiments, we set the word embedding size and user embedding size to 128, and all LSTM hidden state sizes to 256. We use dropout (Srivastava et al. 2014) with probability p = 0.2. During training, we use loss on the validation set to implement early stopping and also apply gradient clipping (Pascanu, Mikolov, and Bengio 2013) with range [ 5, 5]. At test time, our summaries are produced using beam search with beam size 5. We use Adam as our optimizing algorithm. We set the batch size to 128. We use a vocabulary of 30,000 words for both source and target. We truncate the review to 200 tokens... We use the develop set to choose the size of user-speciļ¬c vocabulary and set it to 200. |