Text-Based Interactive Recommendation via Offline Reinforcement Learning

Authors: Ruiyi Zhang, Tong Yu, Yilin Shen, Hongxia Jin11694-11702

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results on the simulator derived from real-world datasets demonstrate the effectiveness of our proposed offline training framework.
Researcher Affiliation Collaboration Ruiyi Zhang1 Tong Yu2 Yilin Shen2 Hongxia Jin2 1 Duke University 2 Samsung Research America ryzhang.cs@gmail.com
Pseudocode Yes Algorithm 1: Offline Interactive Recommendation
Open Source Code No The paper does not provide any explicit statement about making the source code for their methodology publicly available or accessible.
Open Datasets Yes We compare our method with various baseline approaches on UT-Zappos50K (Yu and Grauman 2014a,b).
Dataset Splits No In the evaluation, we randomly select 40,020 shoes to form a training set and the rest shoes to form a test set. It does not mention a validation set split.
Hardware Specification Yes All experiments are conducted on a single Tesla V100 GPU.
Software Dependencies No The paper mentions 'Adam optimizer' but does not specify its version or provide version numbers for other key software components or libraries.
Experiment Setup Yes In the textual encoder, the dimension of the word embedding layer is 32, the dimension of the LSTM is 128, and the dimension of the linear mapping layer is 32. The textual encoder is optimized by the Adam optimizer, with an initial learning rate of 0.001. The reward correction model is a two-layer MLP with dimension of 32, which takes the state and action pair as the input and outputs the estimated Q-value. In the recommender policy network, the dimension of the two-layer MLP is 128. In Adam optimizer, the optimal learning rate found was 5e 4, which is chosen using a hyperparameter search from {1e 3, 5e 4, 1e 4, 5e 5}. The discount factor of reinforcement learning is 0.99.