reproducibilityindex.ai

PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for Reinforcement Learning

Authors: Tao Yu, Cuiling Lan, Wenjun Zeng, Mingxiao Feng, Zhizheng Zhang, Zhibo Chen

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the effectiveness of our designs on the Atari and Deep Mind Control Suite benchmarks. Our method achieves the state-of-the-art performance on both benchmarks.
Researcher Affiliation	Collaboration	Tao Yu1 Cuiling Lan2 Wenjun Zeng2 Mingxiao Feng1 Zhizheng Zhang2 Zhibo Chen1 1University of Science and Technology of China 2Microsoft Research Asia yutao666@mail.ustc.edu.cn, {culan,wezeng}@microsoft.com fmxustc@mail.ustc.edu.cn, zhizzhang@microsoft.com, chenzhibo@ustc.edu.cn
Pseudocode	No	The paper describes the methodology in text and equations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/microsoft/Playvirtual.
Open Datasets	Yes	We evaluate our method on the commonly used discrete control benchmark of Atari [2], and the continuous control benchmark of DMControl [43].
Dataset Splits	No	The paper describes training and evaluation protocols (e.g., '100k interaction steps', '500k environment steps') and mentions using established benchmarks (Atari, DMControl), but it does not specify explicit numerical training, validation, or test dataset splits (e.g., percentages or sample counts for data partitions).
Hardware Specification	No	The paper does not explicitly provide details about the specific hardware (e.g., GPU or CPU models, memory) used for running its experiments.
Software Dependencies	No	All our models are implemented via Py Torch [39].
Experiment Setup	Yes	We set the number of prediction steps K to 9 by default... We simply set the number of action sets, i.e., the number of virtual trajectories M to 2\|A\|... We set K to 6, and set M to a fixed number 10... We set λpred = 1 and λcyc = 1. For d M, we use the distance metric as in SPR [40]... We follow the training settings in CURL except the batch size (reduced from 512 to 128 to save memory cost) and learning rate.