reproducibilityindex.ai

Efficient Wasserstein Natural Gradients for Reinforcement Learning

Authors: Ted Moskovitz, Michael Arbel, Ferenc Huszar, Arthur Gretton

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on challenging tasks demonstrate improvements in both computational cost and performance over advanced baselines. (Abstract) ... We now test the performance of our estimators for both policy gradients (PG) and evolution strategies (ES) against their associated baseline methods. (Section 5, Experiments)
Researcher Affiliation	Academia	Ted Moskovitz 1, Michael Arbel 1, Ferenc Huszar1,2 & Arthur Gretton1 1Gatsby Unit, UCL 2University of Cambridge
Pseudocode	Yes	Algorithm 1: Wasserstein Natural Policy Gradient ... Algorithm 2: Wasserstein Natural Evolution Strategies ... Algorithm 3: Efﬁcient Wasserstein Natural Gradient
Open Source Code	Yes	Further experimental details can be found in the appendix, and our code is available online1. 1https://github.com/tedmoskovitz/WNPG
Open Datasets	Yes	We ﬁrst apply WNPG and BG-WNPG to challenging tasks from Open AI Gym (Brockman et al., 2016) and Roboschool (RS).
Dataset Splits	No	The paper mentions running experiments with '5 random seeds' and tracking performance over 'Timesteps', but it does not specify explicit training/validation/test dataset splits (e.g., 80/10/10) for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions environments like 'Open AI Gym' and 'Roboschool' and implicitly refers to general software components for deep learning and automatic differentiation. However, it does not list specific version numbers for any software dependencies (e.g., 'PyTorch 1.9' or 'Python 3.8').
Experiment Setup	Yes	More precisely, for each task we ran a hyperparameter sweep over learning rates in the set {1e-5, 5e-5, 1e-4, 3e-4}, and used the concatenation-of-actions behavioral embedding Φ(τ) = [a0, a1, . . . , a T ] with the base network implementation the same as Dhariwal et al. (2017). (Appendix D.1) ... The WNG hyperparameters were also left the same as in Arbel et al. (2020). Speciﬁcally, the number of basis points was set as M = 5, the reduction factor was bounded in the range [0.25, 0.75], and ϵ [1e-10, 1e5]. (Appendix D.1) ... Training is up to 4000 gradient iterations, with λ = .9 and β = .1 unless they are varied. (Appendix D.3)