reproducibilityindex.ai

Mask-based Latent Reconstruction for Reinforcement Learning

Authors: Tao Yu, Zhizheng Zhang, Cuiling Lan, Yan Lu, Zhibo Chen

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that our MLR significantly improves the sample efficiency in RL and outperforms the state-of-the-art sample-efficient RL methods on multiple continuous and discrete control benchmarks.
Researcher Affiliation	Collaboration	Tao Yu1 Zhizheng Zhang2 Cuiling Lan2 Yan Lu2 Zhibo Chen1 1University of Science and Technology of China 2Microsoft Research Asia yutao666@mail.ustc.edu.cn, {zhizzhang,culan,yanlu}@microsoft.com chenzhibo@ustc.edu.cn
Pseudocode	No	The paper includes figures illustrating the framework (Figure 1) and predictive latent decoder (Figure 3), but no formal pseudocode or algorithm blocks are provided.
Open Source Code	Yes	Our code is available at https://github.com/microsoft/Mask-based-Latent-Reconstruction.
Open Datasets	Yes	We evaluate the sample efficiency of our MLR on both the continuous control benchmark Deep Mind Control Suite (DMControl) [43] and the discrete control benchmark Atari [5].
Dataset Splits	No	The paper mentions evaluating performance at '100k and 500k environment steps' and training for '100k interaction steps' on Atari, but does not provide specific train/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification	No	The paper states 'The total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Appendix B.' but Appendix B content is not provided in this text for specific hardware details.
Software Dependencies	No	The paper indicates that training details, which typically include software dependencies, are available in 'Appendix B', but this appendix is not provided in the given text, thus specific software versions are not listed.
Experiment Setup	Yes	In MLR, by default, we set the length of a sampled trajectory K to 16 and mask ratio η to 50%. We set the size of the masked cube (k h w) to 8 10 10 on most DMControl tasks and 8 12 12 on the Atari games. ...we set a weight λ to balance Lrl and Lmlr so that the gradients of these two loss items lie in a similar range and empirically find λ = 1 works well in most environments.