Differentiable Trust Region Layers for Deep Reinforcement Learning

Authors: Fabian Otto, Philipp Becker, Vien Anh Ngo, Hanna Carolin Maria Ziesche, Gerhard Neumann

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate that those projection layers achieve similar or better results than existing methods while being almost agnostic to specific implementation choices. We compare and discuss the effect of the different similarity measures as well as the entropy control on the optimization process. Additionally, we benchmark our algorithm against existing methods and demonstrate that we achieve similar or better performance. 5 EXPERIMENTS Mujoco Benchmarks We evaluate the performance of our trust region layers regarding sample complexity and final reward in comparison to PAPI and PPO on the Open AI gym benchmark suite (Brockman et al., 2016).
Researcher Affiliation Collaboration Fabian Otto Bosch Center for Artificial Intelligence University of T ubingen Philipp Becker Karlsruhe Institute of Technology Ngo Anh Vien & Hanna Carolin Ziesche Bosch Center for Artificial Intelligence Gerhard Neumann Karlsruhe Institute of Technology
Pseudocode Yes Algorithm 1 Differentiable Trust Region Layer. Algorithm 2 Algorithmic view of the proposed Trust Region Projections.
Open Source Code Yes The code is available at https://git.io/Jthb0.
Open Datasets Yes We evaluate the performance of our trust region layers regarding sample complexity and final reward in comparison to PAPI and PPO on the Open AI gym benchmark suite (Brockman et al., 2016). Mujoco Benchmarks
Dataset Splits No No explicit statement about training/validation/test dataset splits was found.
Hardware Specification Yes On a 8 Core Intel Core i7-9700K CPU @ 3.60GHz
Software Dependencies No We implemented the whole layer using C++, Armadillo, and Open MP for parallelization. - no version numbers are provided for these software components.
Experiment Setup Yes Tables 2 and 3 show the hyperparameters used for the experiments in Table 1. rollouts 2048, lr 5e-5, minibatch size 32