reproducibilityindex.ai

Projections for Approximate Policy Iteration Algorithms

Authors: Riad Akrour, Joni Pajarinen, Jan Peters, Gerhard Neumann

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our ﬁrst set of experiments is on simple optimization problems to assess the validity of our proposed optimization scheme for constrained problems. Most of the introduced projections g are not on the constraint boundary, at the exception of the entropy constraint of a Gaussian distribution. Thus, it remains to be seen if optimizing L g by gradient ascent can match the quality of solutions obtained via the method of Lagrange multipliers on simple problems.
Researcher Affiliation	Collaboration	1IAS, TU Darmstadt, Darmstadt, Germany 2Tampere University, Finland 3L-CAS, University of Lincoln, Lincoln, United Kingdom 4Bosch Center of Artiﬁcial Intelligence (BCAI), Germany 5Max Planck Institute for Intelligent Systems, T ubingen, Germany.
Pseudocode	Yes	Algorithm 1 DPS Gaussian policy projection; Algorithm 2 API linear-Gaussian policy projection
Open Source Code	Yes	Implementation of Alg. 2 is provided in https://github.com/akrouriad/papi.
Open Datasets	Yes	We run a ﬁrst set of experiments on four benchmark tasks from Roboschool (Brockman et al., 2016).
Dataset Splits	No	The paper describes evaluation metrics and performance tracking during training (e.g., 'initial 100 iterations', 'best window of 500 trajectories'), but it does not specify explicit training/validation/test dataset splits, which are typically found with static datasets rather than dynamic RL environments.
Hardware Specification	Yes	Computations were conducted on the Lichtenberg high performance computer of TU Darmstadt and the NVIDIA DGX station.
Software Dependencies	No	The paper mentions implementing projections 'within Open AI s code base (Dhariwal et al., 2017)' but does not provide specific version numbers for any software libraries or dependencies.
Experiment Setup	Yes	All our experiments use a neural network policy with two hidden layers of 64 neurones. ... For all of the experiments including Fig. 4 PAPI-PPO refers to performing 20 epochs with mini-batches of size 64. For the entropy constraint, we adopt a two phase approach where we initially do not constrain the entropy until it reaches half of the initial entropy and then decrease β linearly by a ﬁxed amount of ϵ. Using the same parameters for PAPI-TRPO would result in improvements over TRPO for some tasks but the entropy of the ﬁnal policy was always relatively high. We obtained best performance for PAPI-TRPO by enforcing an entropy equality constraint using Prop. 1 and only optimizing A for 10 epochs with mini-batches of size 64.