MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning
Authors: Zohar Rimon, Tom Jurgenson, Orr Krupnik, Gilad Adler, Aviv Tamar
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our approach on common meta-RL benchmark domains, attaining greater return with better sample efficiency (up to 15 ) while requiring very little hyperparameter tuning. In addition, we validate our approach on a slate of more challenging, higher-dimensional domains, taking a step towards real-world generalizing agents. |
| Researcher Affiliation | Collaboration | 1Technion Israel Institute of Technology 2Ford Research Center Israel |
| Pseudocode | No | The paper describes the algorithms and their components but does not provide any pseudocode blocks or formally labeled algorithm sections. |
| Open Source Code | Yes | Code available at: https://github.com/zoharri/mamba. |
| Open Datasets | Yes | Environments: We use two common 2D environments in meta-RL, Point Robot Navigation (PRN), and Escape Room (Zintgraf et al., 2019; Dorfman et al., 2021; Rakelly et al., 2019). ... In Reacher-N, the agent (adapted from the Deep Mind Control Suite, Tunyasuvunakool et al. 2020)... Panda Reacher Proposed by Choshen & Tamar (2023)... |
| Dataset Splits | No | The paper states 'In every experiment we test the best model seen during evaluation on a held-out test set of 1000 tasks.' It mentions a test set, but does not provide explicit information about training or validation splits (e.g., percentages, sample counts, or specific split methodologies) for the data used in training/evaluation. |
| Hardware Specification | Yes | All experiments were conducted using an Nvidia T4 GPU, with 32 CPU cores and 120GB RAM. |
| Software Dependencies | No | The paper mentions software like 'Dreamer V3' and 'torch' via GitHub repository names but does not provide specific version numbers for these or other key software dependencies (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | Hyper-parameter Dreamer-Vanilla Dreamer-Tune Description dyn discrete 32 16 Number of discrete latent vectors in the world model. dyn hidden 512 64 MLP size in the world model. dyn stoch 32 16 Size of each latent vector in the world model. imag horizon 15 10 Length of the imagination horizon (NIMG cf. Sec.2.2). units 512 128 Size of hidden MLP layers. Table 2: Hyper parameters differences between Dreamer-Vanilla and Dreamer-Tune. |