MIMEx: Intrinsic Rewards from Masked Input Modeling

Authors: Toru Lin, Allan Jabri

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that MIMEx can achieve superior results when compared against competitive baselines on a suite of challenging sparse-reward visuomotor tasks. In this section, we show that MIMEx achieves superior results and study why it works better than other approaches. We start by evaluating MIMEx against three baseline methods, on eight tasks from Pix MC-Sparse (Section 4) and six tasks from Deep Mind Control suite [40]. Then, we present ablation studies of MIMEx on Pix MC-Sparse to understand specific factors that contribute to MIMEx s performance and obtain insights for its general usage.
Researcher Affiliation Academia Toru Lin University of California, Berkeley toru@berkeley.edu Allan Jabri University of California, Berkeley ajabri@berkeley.edu
Pseudocode No The paper describes the implementation details of MIMEx in Section 3 and Section 5.1, but it does not include any figures, blocks, or sections explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code Yes Code available at https://github.com/Toru Ow O/mimex.
Open Datasets Yes To evaluate on hard-exploration problems with high-dimensional observations and dynamics, we develop a benchmark suite of eight challenging robotic manipulation tasks that involve realistic visuomotor control with sparse rewards. Pix MC-Sparse is built on Pix MC [44] as an extension to the original suite. we implement Dr Qv2 [45] with DDPG [33] (an off-policy algorithm) being the core RL algorithm. Note that MVP and Dr Qv2 are the respective state-of-the-art algorithm on each environment. ALE [6] PRIVATE EYE and VENTURE environments.
Dataset Splits No The paper does not provide explicit training, validation, and test dataset splits with percentages or sample counts. It describes experimental runs (e.g., 'over 7 random seeds') but not data partitioning for validation purposes.
Hardware Specification Yes We run each experiment on an NVIDIA A100 GPU. each run was trained on a single NVIDIA A100 GPU.
Software Dependencies No The paper mentions using the Adam [19] optimizer but does not specify version numbers for any key software components, libraries, or programming languages (e.g., PyTorch version, Python version).
Experiment Setup Yes For all environments, we use a batch size of 512, the Adam [19] optimizer, and a MIMEx learning rate of 0.0001. We use a mask ratio of 70% for all tasks; a β (exploration weight) of 0.05 for Reach tasks and 0.05 for Cabinet, Pick, Move tasks. For the encoder, we use an embedding dimension of 128, with 4 Transformer blocks and 4 heads; for the decoder, we use an embedding dimension of 64, with 1 Transformer block and 2 heads.