Parameter-Based Value Functions

Authors: Francesco Faccio, Louis Kirsch, Jürgen Schmidhuber

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally our algorithms are evaluated on a selection of discrete and continuous control tasks using shallow policies and deep neural networks. Their performance is comparable to state-of-the-art methods. and We make theoretical, algorithmic, and experimental contributions: Section 2 introduces the standard MDP setting; Section 3 formally presents PBVFs and derive algorithms for V (θ), V (s, θ) and Q(s, a, θ); Section 4 describes the experimental evaluation using shallow and deep policies;
Researcher Affiliation Academia Francesco Faccio, Louis Kirsch & J urgen Schmidhuber The Swiss AI Lab IDSIA, USI, SUPSI {francesco,louis,juergen}@idsia.ch
Pseudocode Yes Algorithm 1 Actor-critic with Monte Carlo prediction for V (θ); Algorithm 2 Actor-critic with TD prediction for V (s, θ); Algorithm 3 Stochastic actor-critic with TD prediction for Q(s, a, θ); Algorithm 4 Deterministic actor-critic with TD prediction for Q(s, a, θ)
Open Source Code Yes Code is available at: https://github.com/FF93/Parameter-based-Value-Functions
Open Datasets Yes For this purpose, we use an instance of the 1D Linear Quadratic Regulator (LQR) problem and Swimmer-v3 and Hopper-v3; 100k time steps for all other environments.
Dataset Splits Yes For each hyperparameter configuration, for each environment and policy architecture, we run 5 instances of the learning algorithm using different seeds. We measure the learning progress by running 100 evaluations while learning the deterministic policy (without action or parameter noise) using 10 test trajectories. We use two metrics to determine the best hyperparameters: the average return over policy evaluations during the whole training process and the average return over policy evaluations during the last 20% time steps.
Hardware Specification Yes We also thank NVIDIA Corporation for donating a DGX-1 as part of the Pioneers of AI Research Award and to IBM for donating a Minsky machine.
Software Dependencies No The paper mentions 'Pytorch initialization' but does not specify software names with version numbers for dependencies.
Experiment Setup Yes A.3 IMPLEMENTATION DETAILS. Shared hyperparameters: ... Batch size: 128 for DDPG, PSVF, PAVF; 16 for PSSVF. ... Tuned hyperparameters: ... Policy s learning rate: tuned with values in [1e 2, 1e 3, 1e 4].