reproducibilityindex.ai

A Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation

Authors: Xueying Bai, Jian Guan, Hongning Wang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our theoretical analysis and empirical evaluations demonstrate the effectiveness of our solution in learning policies from the ofﬂine and generated data.The proposed model is veriﬁed through theoretical analysis and extensive empirical evaluations. Experiment results demonstrate our solution s better sample efﬁciency over the state-of-the-art baselines
Researcher Affiliation	Academia	Xueying Bai , Jian Guan , Hongning Wang Department of Computer Science, Stony Brook University Department of Computer Science and Technology, Tsinghua University Department of Computer Science, University of Virginia
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our implementation is available at https://github.com/Jian Guan THU/IRec GAN.
Open Datasets	Yes	We use a large-scale real-world recommendation dataset from CIKM Cup 2016 to evaluate the effectiveness of our proposed solution for ofﬂine reranking.
Dataset Splits	Yes	We selected the top 40,000 most popular items into the recommendation candidate set, and randomly selected 65,284/1,718/1,720 sessions for training/validation/testing.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used to run its experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions using components like “2-layer LSTM units” and an “RNN based discriminator” but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The hyper-parameters in all models are set as follows: the item embedding dimension is set to 50, the discount factor γ in value calculation is set to 0.9, the scale factors λr and λp are set to 3 and 1. We use 2-layer LSTM units with 512-dimension hidden states. The ratio of generated training samples and ofﬂine data for each training epoch is set to 1:10.