reproducibilityindex.ai

Virtual-Taobao: Virtualizing Real-World Online Retail Environment for Reinforcement Learning

Authors: Jing-Cheng Shi, Yang Yu, Qing Da, Shi-Yong Chen, An-Xiang Zeng4902-4909

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In experiments, Virtual-Taobao is trained from hundreds of millions of real Taobao customers records. Compared with the real Taobao, Virtual-Taobao faithfully recovers important properties of the real environment. We further show that the policies trained purely in Virtual-Taobao, which has zero physical sampling cost, can have signiﬁcantly superior real-world performance to the traditional supervised approaches, through online A/B tests.
Researcher Affiliation	Collaboration	Jing-Cheng Shi,1,2 Yang Yu,1 Qing Da,2 Shi-Yong Chen,1 An-Xiang Zeng2 1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China {shijc, yuy, chensy}@lamda.nju.edu.cn 2Alibaba Group {jingcheng.sjc, daqing.dq}@alibaba-inc.com, renzhong@taobao.com
Pseudocode	Yes	Algorithm 1 GAN-SD
Open Source Code	No	No explicit statement or link providing access to source code for the described methodology was found.
Open Datasets	No	In experiments, Virtual-Taobao is trained from hundreds of millions of real Taobao customers records.
Dataset Splits	No	No explicit train/validation/test dataset splits with percentages, counts, or references to predefined standard splits were found.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments were mentioned.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., 'PyTorch 1.9', 'Python 3.8') were mentioned.
Experiment Setup	Yes	We then retrain the TRPO agent with the ANC strategy in which ρ = 1 and µ = 0.01, and R2P is decreased to 0.115 in Virtual-Taobao which is more acceptable.