reproducibilityindex.ai

Parameter-Based Value Functions

Authors: Francesco Faccio, Louis Kirsch, Jürgen Schmidhuber

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally our algorithms are evaluated on a selection of discrete and continuous control tasks using shallow policies and deep neural networks. Their performance is comparable to state-of-the-art methods. and We make theoretical, algorithmic, and experimental contributions: Section 2 introduces the standard MDP setting; Section 3 formally presents PBVFs and derive algorithms for V (θ), V (s, θ) and Q(s, a, θ); Section 4 describes the experimental evaluation using shallow and deep policies;
Researcher Affiliation	Academia	Francesco Faccio, Louis Kirsch & J urgen Schmidhuber The Swiss AI Lab IDSIA, USI, SUPSI {francesco,louis,juergen}@idsia.ch
Pseudocode	Yes	Algorithm 1 Actor-critic with Monte Carlo prediction for V (θ); Algorithm 2 Actor-critic with TD prediction for V (s, θ); Algorithm 3 Stochastic actor-critic with TD prediction for Q(s, a, θ); Algorithm 4 Deterministic actor-critic with TD prediction for Q(s, a, θ)
Open Source Code	Yes	Code is available at: https://github.com/FF93/Parameter-based-Value-Functions
Open Datasets	Yes	For this purpose, we use an instance of the 1D Linear Quadratic Regulator (LQR) problem and Swimmer-v3 and Hopper-v3; 100k time steps for all other environments.
Dataset Splits	Yes	For each hyperparameter conﬁguration, for each environment and policy architecture, we run 5 instances of the learning algorithm using different seeds. We measure the learning progress by running 100 evaluations while learning the deterministic policy (without action or parameter noise) using 10 test trajectories. We use two metrics to determine the best hyperparameters: the average return over policy evaluations during the whole training process and the average return over policy evaluations during the last 20% time steps.
Hardware Specification	Yes	We also thank NVIDIA Corporation for donating a DGX-1 as part of the Pioneers of AI Research Award and to IBM for donating a Minsky machine.
Software Dependencies	No	The paper mentions 'Pytorch initialization' but does not specify software names with version numbers for dependencies.
Experiment Setup	Yes	A.3 IMPLEMENTATION DETAILS. Shared hyperparameters: ... Batch size: 128 for DDPG, PSVF, PAVF; 16 for PSSVF. ... Tuned hyperparameters: ... Policy s learning rate: tuned with values in [1e 2, 1e 3, 1e 4].