MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning

Authors: Zohar Rimon, Tom Jurgenson, Orr Krupnik, Gilad Adler, Aviv Tamar

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our approach on common meta-RL benchmark domains, attaining greater return with better sample efficiency (up to 15 ) while requiring very little hyperparameter tuning. In addition, we validate our approach on a slate of more challenging, higher-dimensional domains, taking a step towards real-world generalizing agents.
Researcher Affiliation Collaboration 1Technion Israel Institute of Technology 2Ford Research Center Israel
Pseudocode No The paper describes the algorithms and their components but does not provide any pseudocode blocks or formally labeled algorithm sections.
Open Source Code Yes Code available at: https://github.com/zoharri/mamba.
Open Datasets Yes Environments: We use two common 2D environments in meta-RL, Point Robot Navigation (PRN), and Escape Room (Zintgraf et al., 2019; Dorfman et al., 2021; Rakelly et al., 2019). ... In Reacher-N, the agent (adapted from the Deep Mind Control Suite, Tunyasuvunakool et al. 2020)... Panda Reacher Proposed by Choshen & Tamar (2023)...
Dataset Splits No The paper states 'In every experiment we test the best model seen during evaluation on a held-out test set of 1000 tasks.' It mentions a test set, but does not provide explicit information about training or validation splits (e.g., percentages, sample counts, or specific split methodologies) for the data used in training/evaluation.
Hardware Specification Yes All experiments were conducted using an Nvidia T4 GPU, with 32 CPU cores and 120GB RAM.
Software Dependencies No The paper mentions software like 'Dreamer V3' and 'torch' via GitHub repository names but does not provide specific version numbers for these or other key software dependencies (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes Hyper-parameter Dreamer-Vanilla Dreamer-Tune Description dyn discrete 32 16 Number of discrete latent vectors in the world model. dyn hidden 512 64 MLP size in the world model. dyn stoch 32 16 Size of each latent vector in the world model. imag horizon 15 10 Length of the imagination horizon (NIMG cf. Sec.2.2). units 512 128 Size of hidden MLP layers. Table 2: Hyper parameters differences between Dreamer-Vanilla and Dreamer-Tune.