Differentiable Trust Region Layers for Deep Reinforcement Learning
Authors: Fabian Otto, Philipp Becker, Vien Anh Ngo, Hanna Carolin Maria Ziesche, Gerhard Neumann
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate that those projection layers achieve similar or better results than existing methods while being almost agnostic to specific implementation choices. We compare and discuss the effect of the different similarity measures as well as the entropy control on the optimization process. Additionally, we benchmark our algorithm against existing methods and demonstrate that we achieve similar or better performance. 5 EXPERIMENTS Mujoco Benchmarks We evaluate the performance of our trust region layers regarding sample complexity and final reward in comparison to PAPI and PPO on the Open AI gym benchmark suite (Brockman et al., 2016). |
| Researcher Affiliation | Collaboration | Fabian Otto Bosch Center for Artificial Intelligence University of T ubingen Philipp Becker Karlsruhe Institute of Technology Ngo Anh Vien & Hanna Carolin Ziesche Bosch Center for Artificial Intelligence Gerhard Neumann Karlsruhe Institute of Technology |
| Pseudocode | Yes | Algorithm 1 Differentiable Trust Region Layer. Algorithm 2 Algorithmic view of the proposed Trust Region Projections. |
| Open Source Code | Yes | The code is available at https://git.io/Jthb0. |
| Open Datasets | Yes | We evaluate the performance of our trust region layers regarding sample complexity and final reward in comparison to PAPI and PPO on the Open AI gym benchmark suite (Brockman et al., 2016). Mujoco Benchmarks |
| Dataset Splits | No | No explicit statement about training/validation/test dataset splits was found. |
| Hardware Specification | Yes | On a 8 Core Intel Core i7-9700K CPU @ 3.60GHz |
| Software Dependencies | No | We implemented the whole layer using C++, Armadillo, and Open MP for parallelization. - no version numbers are provided for these software components. |
| Experiment Setup | Yes | Tables 2 and 3 show the hyperparameters used for the experiments in Table 1. rollouts 2048, lr 5e-5, minibatch size 32 |