reproducibilityindex.ai

Differentiable Trust Region Layers for Deep Reinforcement Learning

Authors: Fabian Otto, Philipp Becker, Vien Anh Ngo, Hanna Carolin Maria Ziesche, Gerhard Neumann

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate that those projection layers achieve similar or better results than existing methods while being almost agnostic to speciﬁc implementation choices. We compare and discuss the effect of the different similarity measures as well as the entropy control on the optimization process. Additionally, we benchmark our algorithm against existing methods and demonstrate that we achieve similar or better performance. 5 EXPERIMENTS Mujoco Benchmarks We evaluate the performance of our trust region layers regarding sample complexity and ﬁnal reward in comparison to PAPI and PPO on the Open AI gym benchmark suite (Brockman et al., 2016).
Researcher Affiliation	Collaboration	Fabian Otto Bosch Center for Artiﬁcial Intelligence University of T ubingen Philipp Becker Karlsruhe Institute of Technology Ngo Anh Vien & Hanna Carolin Ziesche Bosch Center for Artiﬁcial Intelligence Gerhard Neumann Karlsruhe Institute of Technology
Pseudocode	Yes	Algorithm 1 Differentiable Trust Region Layer. Algorithm 2 Algorithmic view of the proposed Trust Region Projections.
Open Source Code	Yes	The code is available at https://git.io/Jthb0.
Open Datasets	Yes	We evaluate the performance of our trust region layers regarding sample complexity and ﬁnal reward in comparison to PAPI and PPO on the Open AI gym benchmark suite (Brockman et al., 2016). Mujoco Benchmarks
Dataset Splits	No	No explicit statement about training/validation/test dataset splits was found.
Hardware Specification	Yes	On a 8 Core Intel Core i7-9700K CPU @ 3.60GHz
Software Dependencies	No	We implemented the whole layer using C++, Armadillo, and Open MP for parallelization. - no version numbers are provided for these software components.
Experiment Setup	Yes	Tables 2 and 3 show the hyperparameters used for the experiments in Table 1. rollouts 2048, lr 5e-5, minibatch size 32