A Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation

Authors: Xueying Bai, Jian Guan, Hongning Wang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our theoretical analysis and empirical evaluations demonstrate the effectiveness of our solution in learning policies from the offline and generated data.The proposed model is verified through theoretical analysis and extensive empirical evaluations. Experiment results demonstrate our solution s better sample efficiency over the state-of-the-art baselines
Researcher Affiliation Academia Xueying Bai , Jian Guan , Hongning Wang Department of Computer Science, Stony Brook University Department of Computer Science and Technology, Tsinghua University Department of Computer Science, University of Virginia
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our implementation is available at https://github.com/Jian Guan THU/IRec GAN.
Open Datasets Yes We use a large-scale real-world recommendation dataset from CIKM Cup 2016 to evaluate the effectiveness of our proposed solution for offline reranking.
Dataset Splits Yes We selected the top 40,000 most popular items into the recommendation candidate set, and randomly selected 65,284/1,718/1,720 sessions for training/validation/testing.
Hardware Specification No The paper does not explicitly describe the specific hardware used to run its experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions using components like “2-layer LSTM units” and an “RNN based discriminator” but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The hyper-parameters in all models are set as follows: the item embedding dimension is set to 50, the discount factor γ in value calculation is set to 0.9, the scale factors λr and λp are set to 3 and 1. We use 2-layer LSTM units with 512-dimension hidden states. The ratio of generated training samples and offline data for each training epoch is set to 1:10.