Mask-based Latent Reconstruction for Reinforcement Learning
Authors: Tao Yu, Zhizheng Zhang, Cuiling Lan, Yan Lu, Zhibo Chen
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that our MLR significantly improves the sample efficiency in RL and outperforms the state-of-the-art sample-efficient RL methods on multiple continuous and discrete control benchmarks. |
| Researcher Affiliation | Collaboration | Tao Yu1 Zhizheng Zhang2 Cuiling Lan2 Yan Lu2 Zhibo Chen1 1University of Science and Technology of China 2Microsoft Research Asia yutao666@mail.ustc.edu.cn, {zhizzhang,culan,yanlu}@microsoft.com chenzhibo@ustc.edu.cn |
| Pseudocode | No | The paper includes figures illustrating the framework (Figure 1) and predictive latent decoder (Figure 3), but no formal pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | Our code is available at https://github.com/microsoft/Mask-based-Latent-Reconstruction. |
| Open Datasets | Yes | We evaluate the sample efficiency of our MLR on both the continuous control benchmark Deep Mind Control Suite (DMControl) [43] and the discrete control benchmark Atari [5]. |
| Dataset Splits | No | The paper mentions evaluating performance at '100k and 500k environment steps' and training for '100k interaction steps' on Atari, but does not provide specific train/validation/test *dataset* splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | No | The paper states 'The total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Appendix B.' but Appendix B content is not provided in this text for specific hardware details. |
| Software Dependencies | No | The paper indicates that training details, which typically include software dependencies, are available in 'Appendix B', but this appendix is not provided in the given text, thus specific software versions are not listed. |
| Experiment Setup | Yes | In MLR, by default, we set the length of a sampled trajectory K to 16 and mask ratio η to 50%. We set the size of the masked cube (k h w) to 8 10 10 on most DMControl tasks and 8 12 12 on the Atari games. ...we set a weight λ to balance Lrl and Lmlr so that the gradients of these two loss items lie in a similar range and empirically find λ = 1 works well in most environments. |