Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Train Hard, Fight Easy: Robust Meta Reinforcement Learning
Authors: Ido Greenberg, Shie Mannor, Gal Chechik, Eli Meirom
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that Ro ML achieves robust returns on multiple navigation and continuous control benchmarks. We test our algorithms on several domains. Section 6.1 considers a navigation problem... Section 6.2 considers several continuous control environments... |
| Researcher Affiliation | Collaboration | Ido Greenberg Technion, Nvidia Research EMAIL Shie Mannor Technion, Nvidia Research EMAIL Gal Chechik Bar Ilan University, Nvidia Research EMAIL Eli Meirom Nvidia Research EMAIL |
| Pseudocode | Yes | Algorithm 1: CVa R Meta Learning (CVa R-ML), Algorithm 2: Robust Meta RL (Ro ML), Algorithm 3: The Cross Entropy Method (CEM) |
| Open Source Code | Yes | The code is available in our repositories: Vari BAD, PEARL, Ce So R, PAIRED and MAML. |
| Open Datasets | Yes | We rely on standard continuous control problems from the Mu Jo Co framework [Todorov et al., 2012]: training a cheetah to run (Half Cheetah), and training a Humanoid and an Ant to walk. |
| Dataset Splits | No | The paper discusses training and testing phases but does not explicitly provide percentages or counts for dataset splits into training, validation, and testing sets in a traditional supervised learning manner. For instance, for sine regression, it mentions 10 samples for fine-tuning and 10 for testing per task, which is a per-task data generation rather than an overall dataset split. |
| Hardware Specification | Yes | All experiments were performed on machines with Intel Xeon 2.2 GHZ CPU and NVIDIA s V100 GPU. |
| Software Dependencies | No | The paper mentions several frameworks and algorithms it builds upon or compares against (e.g., MuJoCo, Vari BAD, PEARL, PPO, MAML) with citations. However, it does not provide specific version numbers for general software dependencies such as Python, PyTorch, TensorFlow, or other libraries. |
| Experiment Setup | Yes | Hyper-parameters: To test the practical applicability of Ro ML as a meta-algorithm, in every experiment, we use the same hyper-parameters for Ro ML, CVa R-ML and their baseline. In particular, we use the baseline s default hyper-parameters whenever applicable... As for the additional hyper-parameters of the meta-algorithm itself: in Algorithm 1, we use M = 1 meta-rollout per task; and in Algorithm 2, we use β = 0.2, ν = 0 unless specified otherwise. |