reproducibilityindex.ai

Solving Minimum-Cost Reach Avoid using Reinforcement Learning

Authors: Oswin So, Cheng Ge, Chuchu Fan

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results demonstrate that RC-PPO learns policies with comparable goal-reaching rates to while achieving up to 57% lower cumulative costs compared to existing methods on a suite of minimum-cost reach-avoid benchmarks on the Mujoco simulator.
Researcher Affiliation	Academia	Oswin So* Department of Aeronautics and Astronautics MIT oswinso@mit.edu Cheng Ge* Department of Aeronautics and Astronautics MIT gec_mike@mit.edu Chuchu Fan Department of Aeronautics and Astronautics MIT chuchu@mit.edu
Pseudocode	Yes	Algorithm 1 RC-PPO (Actor Critic)
Open Source Code	Yes	The project page can be found at https://oswinso.xyz/rcppo/. (...) Yes, the code used for generating the results in the paper has been provided.
Open Datasets	Yes	We compare RC-PPO with baseline methods on several minimum-cost reach-avoid environments. We consider an inverted pendulum (Pendulum), an environment from Safety Gym [69] (Point Goal) and two custom environments from Mu Jo Co [70], (Safety Hopper, Safety Half Cheetah) with added hazard regions and goal regions. We also consider a 3D quadrotor navigation task in a simulated wind field for an urban environment [71, 72] (Wind Field) and an Fixed-Wing avoid task from [59] with an additional goal region (Fixed Wing).
Dataset Splits	No	The paper focuses on reinforcement learning environments and does not describe traditional dataset splits for training, validation, or testing.
Hardware Specification	Yes	We run all our experiments on a computer with CPU AMD Ryzen Threadripper 3970X 32-Core Processor and with 4 GPUs of RTX3090.
Software Dependencies	No	Also, we implement all the environments in Jax [76] for better scalability and parallelization. The specific version number for Jax is not provided.
Experiment Setup	Yes	Table 1: Hyperparameter Settings for On-policy Algorithms lists specific values for MLP Units per Hidden Layer 256, Numbers of Hidden Layers 2, Discount factor γ 0.99, Clip Ratio 0.2, and learning rates. Section F.2 provides details on Xthreshold, β, and Cfail values.