reproducibilityindex.ai

MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning

Authors: Zohar Rimon, Tom Jurgenson, Orr Krupnik, Gilad Adler, Aviv Tamar

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our approach on common meta-RL benchmark domains, attaining greater return with better sample efficiency (up to 15 ) while requiring very little hyperparameter tuning. In addition, we validate our approach on a slate of more challenging, higher-dimensional domains, taking a step towards real-world generalizing agents.
Researcher Affiliation	Collaboration	1Technion Israel Institute of Technology 2Ford Research Center Israel
Pseudocode	No	The paper describes the algorithms and their components but does not provide any pseudocode blocks or formally labeled algorithm sections.
Open Source Code	Yes	Code available at: https://github.com/zoharri/mamba.
Open Datasets	Yes	Environments: We use two common 2D environments in meta-RL, Point Robot Navigation (PRN), and Escape Room (Zintgraf et al., 2019; Dorfman et al., 2021; Rakelly et al., 2019). ... In Reacher-N, the agent (adapted from the Deep Mind Control Suite, Tunyasuvunakool et al. 2020)... Panda Reacher Proposed by Choshen & Tamar (2023)...
Dataset Splits	No	The paper states 'In every experiment we test the best model seen during evaluation on a held-out test set of 1000 tasks.' It mentions a test set, but does not provide explicit information about training or validation splits (e.g., percentages, sample counts, or specific split methodologies) for the data used in training/evaluation.
Hardware Specification	Yes	All experiments were conducted using an Nvidia T4 GPU, with 32 CPU cores and 120GB RAM.
Software Dependencies	No	The paper mentions software like 'Dreamer V3' and 'torch' via GitHub repository names but does not provide specific version numbers for these or other key software dependencies (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	Hyper-parameter Dreamer-Vanilla Dreamer-Tune Description dyn discrete 32 16 Number of discrete latent vectors in the world model. dyn hidden 512 64 MLP size in the world model. dyn stoch 32 16 Size of each latent vector in the world model. imag horizon 15 10 Length of the imagination horizon (NIMG cf. Sec.2.2). units 512 128 Size of hidden MLP layers. Table 2: Hyper parameters differences between Dreamer-Vanilla and Dreamer-Tune.