reproducibilityindex.ai

Text-Based Interactive Recommendation via Offline Reinforcement Learning

Authors: Ruiyi Zhang, Tong Yu, Yilin Shen, Hongxia Jin11694-11702

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results on the simulator derived from real-world datasets demonstrate the effectiveness of our proposed ofﬂine training framework.
Researcher Affiliation	Collaboration	Ruiyi Zhang1 Tong Yu2 Yilin Shen2 Hongxia Jin2 1 Duke University 2 Samsung Research America ryzhang.cs@gmail.com
Pseudocode	Yes	Algorithm 1: Ofﬂine Interactive Recommendation
Open Source Code	No	The paper does not provide any explicit statement about making the source code for their methodology publicly available or accessible.
Open Datasets	Yes	We compare our method with various baseline approaches on UT-Zappos50K (Yu and Grauman 2014a,b).
Dataset Splits	No	In the evaluation, we randomly select 40,020 shoes to form a training set and the rest shoes to form a test set. It does not mention a validation set split.
Hardware Specification	Yes	All experiments are conducted on a single Tesla V100 GPU.
Software Dependencies	No	The paper mentions 'Adam optimizer' but does not specify its version or provide version numbers for other key software components or libraries.
Experiment Setup	Yes	In the textual encoder, the dimension of the word embedding layer is 32, the dimension of the LSTM is 128, and the dimension of the linear mapping layer is 32. The textual encoder is optimized by the Adam optimizer, with an initial learning rate of 0.001. The reward correction model is a two-layer MLP with dimension of 32, which takes the state and action pair as the input and outputs the estimated Q-value. In the recommender policy network, the dimension of the two-layer MLP is 128. In Adam optimizer, the optimal learning rate found was 5e 4, which is chosen using a hyperparameter search from {1e 3, 5e 4, 1e 4, 5e 5}. The discount factor of reinforcement learning is 0.99.