reproducibilityindex.ai

Multi-Objective Intrinsic Reward Learning for Conversational Recommender Systems

Authors: Zhendong Chu, Nan Wang, Hongning Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the effectiveness of our approach, we conduct extensive experiments on three public CRS benchmarks. The results show that our algorithm significantly improves CRS performance by exploiting informative learned intrinsic rewards.
Researcher Affiliation	Collaboration	Zhendong Chu University of Virginia zc9uy@virginia.edu Charlottesville, VA, USA Nan Wang Netflix Inc. nanw@netflix.com Los Gatos, CA, USA Hongning Wang University of Virginia hw5x@virginia.edu Charlottesville, VA, USA
Pseudocode	Yes	Algorithm 1: Optimization algorithm of CRSIRL
Open Source Code	No	No statement about open-sourcing the code or a link to a code repository was found.
Open Datasets	Yes	We evaluate CRSIRL on three multi-round CRS benchmarks [Lei et al., 2020a, Deng et al., 2021]. The Last FM dataset is for music artist recommendation. Lei et al. [2020a] manually grouped the original attributes into 33 coarse-grained attributes. The Last FM* dataset is the version where attributes are not grouped. The Yelp* dataset is for local business recommendation. We summarize their statistics in Table 1.
Dataset Splits	Yes	All datasets are split by 7:1.5:1.5 ratio for training, validation and testing.
Hardware Specification	Yes	All experiments are run on an NVIDIA Geforce RTX 3080Ti GPU with 12 GB memory.
Software Dependencies	No	The paper mentions 'Adam optimizer' and 'Transformer-based state encoder' but does not specify versions for any programming languages, libraries, or frameworks (e.g., Python, PyTorch, etc.).
Experiment Setup	Yes	The learning rates in the inner and outer loop are searched from {1e 5, 5e 5, 1e 4} with Adam optimizer. The coefficient of intrinsic reward λ is searched from {0.05, 0.1, 0.5, 1.0}. The discount factor γ is set to 0.999. All experiments are run on an NVIDIA Geforce RTX 3080Ti GPU with 12 GB memory. RL-based baselines rely on handcrafted rewards, we follow Lei et al. [2020a] to set (1) rrec_suc = 1 for successful recommendation; (2) rrec_fail = 0.1 for failed recommendation; (3) rask_suc = 0.1 when the inquired attribute is confirmed by the user; (4) rrec_fail = 0.1 when the inquired attribute is dismissed by the user; (5) rquit = 0.3 when the user quits the conversation without a successful recommendation. We set the maximum turn T as 15 and the size K of the recommendation list as 10.