Solving Minimum-Cost Reach Avoid using Reinforcement Learning

Authors: Oswin So, Cheng Ge, Chuchu Fan

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results demonstrate that RC-PPO learns policies with comparable goal-reaching rates to while achieving up to 57% lower cumulative costs compared to existing methods on a suite of minimum-cost reach-avoid benchmarks on the Mujoco simulator.
Researcher Affiliation Academia Oswin So* Department of Aeronautics and Astronautics MIT oswinso@mit.edu Cheng Ge* Department of Aeronautics and Astronautics MIT gec_mike@mit.edu Chuchu Fan Department of Aeronautics and Astronautics MIT chuchu@mit.edu
Pseudocode Yes Algorithm 1 RC-PPO (Actor Critic)
Open Source Code Yes The project page can be found at https://oswinso.xyz/rcppo/. (...) Yes, the code used for generating the results in the paper has been provided.
Open Datasets Yes We compare RC-PPO with baseline methods on several minimum-cost reach-avoid environments. We consider an inverted pendulum (Pendulum), an environment from Safety Gym [69] (Point Goal) and two custom environments from Mu Jo Co [70], (Safety Hopper, Safety Half Cheetah) with added hazard regions and goal regions. We also consider a 3D quadrotor navigation task in a simulated wind field for an urban environment [71, 72] (Wind Field) and an Fixed-Wing avoid task from [59] with an additional goal region (Fixed Wing).
Dataset Splits No The paper focuses on reinforcement learning environments and does not describe traditional dataset splits for training, validation, or testing.
Hardware Specification Yes We run all our experiments on a computer with CPU AMD Ryzen Threadripper 3970X 32-Core Processor and with 4 GPUs of RTX3090.
Software Dependencies No Also, we implement all the environments in Jax [76] for better scalability and parallelization. The specific version number for Jax is not provided.
Experiment Setup Yes Table 1: Hyperparameter Settings for On-policy Algorithms lists specific values for MLP Units per Hidden Layer 256, Numbers of Hidden Layers 2, Discount factor γ 0.99, Clip Ratio 0.2, and learning rates. Section F.2 provides details on Xthreshold, β, and Cfail values.