A Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation
Authors: Xueying Bai, Jian Guan, Hongning Wang
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theoretical analysis and empirical evaluations demonstrate the effectiveness of our solution in learning policies from the offline and generated data.The proposed model is verified through theoretical analysis and extensive empirical evaluations. Experiment results demonstrate our solution s better sample efficiency over the state-of-the-art baselines |
| Researcher Affiliation | Academia | Xueying Bai , Jian Guan , Hongning Wang Department of Computer Science, Stony Brook University Department of Computer Science and Technology, Tsinghua University Department of Computer Science, University of Virginia |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our implementation is available at https://github.com/Jian Guan THU/IRec GAN. |
| Open Datasets | Yes | We use a large-scale real-world recommendation dataset from CIKM Cup 2016 to evaluate the effectiveness of our proposed solution for offline reranking. |
| Dataset Splits | Yes | We selected the top 40,000 most popular items into the recommendation candidate set, and randomly selected 65,284/1,718/1,720 sessions for training/validation/testing. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used to run its experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions using components like “2-layer LSTM units” and an “RNN based discriminator” but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The hyper-parameters in all models are set as follows: the item embedding dimension is set to 50, the discount factor γ in value calculation is set to 0.9, the scale factors λr and λp are set to 3 and 1. We use 2-layer LSTM units with 512-dimension hidden states. The ratio of generated training samples and offline data for each training epoch is set to 1:10. |