Train Hard, Fight Easy: Robust Meta Reinforcement Learning
Authors: Ido Greenberg, Shie Mannor, Gal Chechik, Eli Meirom
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that Ro ML achieves robust returns on multiple navigation and continuous control benchmarks. We test our algorithms on several domains. Section 6.1 considers a navigation problem... Section 6.2 considers several continuous control environments... |
| Researcher Affiliation | Collaboration | Ido Greenberg Technion, Nvidia Research gido@campus.technion.ac.il Shie Mannor Technion, Nvidia Research shie@ee.technion.ac.il Gal Chechik Bar Ilan University, Nvidia Research gchechik@nvidia.com Eli Meirom Nvidia Research emeirom@nvidia.com |
| Pseudocode | Yes | Algorithm 1: CVa R Meta Learning (CVa R-ML), Algorithm 2: Robust Meta RL (Ro ML), Algorithm 3: The Cross Entropy Method (CEM) |
| Open Source Code | Yes | The code is available in our repositories: Vari BAD, PEARL, Ce So R, PAIRED and MAML. |
| Open Datasets | Yes | We rely on standard continuous control problems from the Mu Jo Co framework [Todorov et al., 2012]: training a cheetah to run (Half Cheetah), and training a Humanoid and an Ant to walk. |
| Dataset Splits | No | The paper discusses training and testing phases but does not explicitly provide percentages or counts for dataset splits into training, validation, and testing sets in a traditional supervised learning manner. For instance, for sine regression, it mentions 10 samples for fine-tuning and 10 for testing per task, which is a per-task data generation rather than an overall dataset split. |
| Hardware Specification | Yes | All experiments were performed on machines with Intel Xeon 2.2 GHZ CPU and NVIDIA s V100 GPU. |
| Software Dependencies | No | The paper mentions several frameworks and algorithms it builds upon or compares against (e.g., MuJoCo, Vari BAD, PEARL, PPO, MAML) with citations. However, it does not provide specific version numbers for general software dependencies such as Python, PyTorch, TensorFlow, or other libraries. |
| Experiment Setup | Yes | Hyper-parameters: To test the practical applicability of Ro ML as a meta-algorithm, in every experiment, we use the same hyper-parameters for Ro ML, CVa R-ML and their baseline. In particular, we use the baseline s default hyper-parameters whenever applicable... As for the additional hyper-parameters of the meta-algorithm itself: in Algorithm 1, we use M = 1 meta-rollout per task; and in Algorithm 2, we use β = 0.2, ν = 0 unless specified otherwise. |