reproducibilityindex.ai

Train Hard, Fight Easy: Robust Meta Reinforcement Learning

Authors: Ido Greenberg, Shie Mannor, Gal Chechik, Eli Meirom

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that Ro ML achieves robust returns on multiple navigation and continuous control benchmarks. We test our algorithms on several domains. Section 6.1 considers a navigation problem... Section 6.2 considers several continuous control environments...
Researcher Affiliation	Collaboration	Ido Greenberg Technion, Nvidia Research gido@campus.technion.ac.il Shie Mannor Technion, Nvidia Research shie@ee.technion.ac.il Gal Chechik Bar Ilan University, Nvidia Research gchechik@nvidia.com Eli Meirom Nvidia Research emeirom@nvidia.com
Pseudocode	Yes	Algorithm 1: CVa R Meta Learning (CVa R-ML), Algorithm 2: Robust Meta RL (Ro ML), Algorithm 3: The Cross Entropy Method (CEM)
Open Source Code	Yes	The code is available in our repositories: Vari BAD, PEARL, Ce So R, PAIRED and MAML.
Open Datasets	Yes	We rely on standard continuous control problems from the Mu Jo Co framework [Todorov et al., 2012]: training a cheetah to run (Half Cheetah), and training a Humanoid and an Ant to walk.
Dataset Splits	No	The paper discusses training and testing phases but does not explicitly provide percentages or counts for dataset splits into training, validation, and testing sets in a traditional supervised learning manner. For instance, for sine regression, it mentions 10 samples for fine-tuning and 10 for testing per task, which is a per-task data generation rather than an overall dataset split.
Hardware Specification	Yes	All experiments were performed on machines with Intel Xeon 2.2 GHZ CPU and NVIDIA s V100 GPU.
Software Dependencies	No	The paper mentions several frameworks and algorithms it builds upon or compares against (e.g., MuJoCo, Vari BAD, PEARL, PPO, MAML) with citations. However, it does not provide specific version numbers for general software dependencies such as Python, PyTorch, TensorFlow, or other libraries.
Experiment Setup	Yes	Hyper-parameters: To test the practical applicability of Ro ML as a meta-algorithm, in every experiment, we use the same hyper-parameters for Ro ML, CVa R-ML and their baseline. In particular, we use the baseline s default hyper-parameters whenever applicable... As for the additional hyper-parameters of the meta-algorithm itself: in Algorithm 1, we use M = 1 meta-rollout per task; and in Algorithm 2, we use β = 0.2, ν = 0 unless specified otherwise.