Mask-based Latent Reconstruction for Reinforcement Learning

Authors: Tao Yu, Zhizheng Zhang, Cuiling Lan, Yan Lu, Zhibo Chen

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that our MLR significantly improves the sample efficiency in RL and outperforms the state-of-the-art sample-efficient RL methods on multiple continuous and discrete control benchmarks.
Researcher Affiliation Collaboration Tao Yu1 Zhizheng Zhang2 Cuiling Lan2 Yan Lu2 Zhibo Chen1 1University of Science and Technology of China 2Microsoft Research Asia yutao666@mail.ustc.edu.cn, {zhizzhang,culan,yanlu}@microsoft.com chenzhibo@ustc.edu.cn
Pseudocode No The paper includes figures illustrating the framework (Figure 1) and predictive latent decoder (Figure 3), but no formal pseudocode or algorithm blocks are provided.
Open Source Code Yes Our code is available at https://github.com/microsoft/Mask-based-Latent-Reconstruction.
Open Datasets Yes We evaluate the sample efficiency of our MLR on both the continuous control benchmark Deep Mind Control Suite (DMControl) [43] and the discrete control benchmark Atari [5].
Dataset Splits No The paper mentions evaluating performance at '100k and 500k environment steps' and training for '100k interaction steps' on Atari, but does not provide specific train/validation/test *dataset* splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification No The paper states 'The total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Appendix B.' but Appendix B content is not provided in this text for specific hardware details.
Software Dependencies No The paper indicates that training details, which typically include software dependencies, are available in 'Appendix B', but this appendix is not provided in the given text, thus specific software versions are not listed.
Experiment Setup Yes In MLR, by default, we set the length of a sampled trajectory K to 16 and mask ratio η to 50%. We set the size of the masked cube (k h w) to 8 10 10 on most DMControl tasks and 8 12 12 on the Atari games. ...we set a weight λ to balance Lrl and Lmlr so that the gradients of these two loss items lie in a similar range and empirically find λ = 1 works well in most environments.