Train Hard, Fight Easy: Robust Meta Reinforcement Learning

Authors: Ido Greenberg, Shie Mannor, Gal Chechik, Eli Meirom

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that Ro ML achieves robust returns on multiple navigation and continuous control benchmarks. We test our algorithms on several domains. Section 6.1 considers a navigation problem... Section 6.2 considers several continuous control environments...
Researcher Affiliation Collaboration Ido Greenberg Technion, Nvidia Research gido@campus.technion.ac.il Shie Mannor Technion, Nvidia Research shie@ee.technion.ac.il Gal Chechik Bar Ilan University, Nvidia Research gchechik@nvidia.com Eli Meirom Nvidia Research emeirom@nvidia.com
Pseudocode Yes Algorithm 1: CVa R Meta Learning (CVa R-ML), Algorithm 2: Robust Meta RL (Ro ML), Algorithm 3: The Cross Entropy Method (CEM)
Open Source Code Yes The code is available in our repositories: Vari BAD, PEARL, Ce So R, PAIRED and MAML.
Open Datasets Yes We rely on standard continuous control problems from the Mu Jo Co framework [Todorov et al., 2012]: training a cheetah to run (Half Cheetah), and training a Humanoid and an Ant to walk.
Dataset Splits No The paper discusses training and testing phases but does not explicitly provide percentages or counts for dataset splits into training, validation, and testing sets in a traditional supervised learning manner. For instance, for sine regression, it mentions 10 samples for fine-tuning and 10 for testing per task, which is a per-task data generation rather than an overall dataset split.
Hardware Specification Yes All experiments were performed on machines with Intel Xeon 2.2 GHZ CPU and NVIDIA s V100 GPU.
Software Dependencies No The paper mentions several frameworks and algorithms it builds upon or compares against (e.g., MuJoCo, Vari BAD, PEARL, PPO, MAML) with citations. However, it does not provide specific version numbers for general software dependencies such as Python, PyTorch, TensorFlow, or other libraries.
Experiment Setup Yes Hyper-parameters: To test the practical applicability of Ro ML as a meta-algorithm, in every experiment, we use the same hyper-parameters for Ro ML, CVa R-ML and their baseline. In particular, we use the baseline s default hyper-parameters whenever applicable... As for the additional hyper-parameters of the meta-algorithm itself: in Algorithm 1, we use M = 1 meta-rollout per task; and in Algorithm 2, we use β = 0.2, ν = 0 unless specified otherwise.