Solving Minimum-Cost Reach Avoid using Reinforcement Learning
Authors: Oswin So, Cheng Ge, Chuchu Fan
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results demonstrate that RC-PPO learns policies with comparable goal-reaching rates to while achieving up to 57% lower cumulative costs compared to existing methods on a suite of minimum-cost reach-avoid benchmarks on the Mujoco simulator. |
| Researcher Affiliation | Academia | Oswin So* Department of Aeronautics and Astronautics MIT oswinso@mit.edu Cheng Ge* Department of Aeronautics and Astronautics MIT gec_mike@mit.edu Chuchu Fan Department of Aeronautics and Astronautics MIT chuchu@mit.edu |
| Pseudocode | Yes | Algorithm 1 RC-PPO (Actor Critic) |
| Open Source Code | Yes | The project page can be found at https://oswinso.xyz/rcppo/. (...) Yes, the code used for generating the results in the paper has been provided. |
| Open Datasets | Yes | We compare RC-PPO with baseline methods on several minimum-cost reach-avoid environments. We consider an inverted pendulum (Pendulum), an environment from Safety Gym [69] (Point Goal) and two custom environments from Mu Jo Co [70], (Safety Hopper, Safety Half Cheetah) with added hazard regions and goal regions. We also consider a 3D quadrotor navigation task in a simulated wind field for an urban environment [71, 72] (Wind Field) and an Fixed-Wing avoid task from [59] with an additional goal region (Fixed Wing). |
| Dataset Splits | No | The paper focuses on reinforcement learning environments and does not describe traditional dataset splits for training, validation, or testing. |
| Hardware Specification | Yes | We run all our experiments on a computer with CPU AMD Ryzen Threadripper 3970X 32-Core Processor and with 4 GPUs of RTX3090. |
| Software Dependencies | No | Also, we implement all the environments in Jax [76] for better scalability and parallelization. The specific version number for Jax is not provided. |
| Experiment Setup | Yes | Table 1: Hyperparameter Settings for On-policy Algorithms lists specific values for MLP Units per Hidden Layer 256, Numbers of Hidden Layers 2, Discount factor γ 0.99, Clip Ratio 0.2, and learning rates. Section F.2 provides details on Xthreshold, β, and Cfail values. |