Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Virtual-Taobao: Virtualizing Real-World Online Retail Environment for Reinforcement Learning
Authors: Jing-Cheng Shi, Yang Yu, Qing Da, Shi-Yong Chen, An-Xiang Zeng4902-4909
AAAI 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, Virtual-Taobao is trained from hundreds of millions of real Taobao customers records. Compared with the real Taobao, Virtual-Taobao faithfully recovers important properties of the real environment. We further show that the policies trained purely in Virtual-Taobao, which has zero physical sampling cost, can have significantly superior real-world performance to the traditional supervised approaches, through online A/B tests. |
| Researcher Affiliation | Collaboration | Jing-Cheng Shi,1,2 Yang Yu,1 Qing Da,2 Shi-Yong Chen,1 An-Xiang Zeng2 1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China EMAIL 2Alibaba Group EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 GAN-SD |
| Open Source Code | No | No explicit statement or link providing access to source code for the described methodology was found. |
| Open Datasets | No | In experiments, Virtual-Taobao is trained from hundreds of millions of real Taobao customers records. |
| Dataset Splits | No | No explicit train/validation/test dataset splits with percentages, counts, or references to predefined standard splits were found. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments were mentioned. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., 'PyTorch 1.9', 'Python 3.8') were mentioned. |
| Experiment Setup | Yes | We then retrain the TRPO agent with the ANC strategy in which ρ = 1 and µ = 0.01, and R2P is decreased to 0.115 in Virtual-Taobao which is more acceptable. |