Parameter-Based Value Functions
Authors: Francesco Faccio, Louis Kirsch, Jürgen Schmidhuber
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally our algorithms are evaluated on a selection of discrete and continuous control tasks using shallow policies and deep neural networks. Their performance is comparable to state-of-the-art methods. and We make theoretical, algorithmic, and experimental contributions: Section 2 introduces the standard MDP setting; Section 3 formally presents PBVFs and derive algorithms for V (θ), V (s, θ) and Q(s, a, θ); Section 4 describes the experimental evaluation using shallow and deep policies; |
| Researcher Affiliation | Academia | Francesco Faccio, Louis Kirsch & J urgen Schmidhuber The Swiss AI Lab IDSIA, USI, SUPSI {francesco,louis,juergen}@idsia.ch |
| Pseudocode | Yes | Algorithm 1 Actor-critic with Monte Carlo prediction for V (θ); Algorithm 2 Actor-critic with TD prediction for V (s, θ); Algorithm 3 Stochastic actor-critic with TD prediction for Q(s, a, θ); Algorithm 4 Deterministic actor-critic with TD prediction for Q(s, a, θ) |
| Open Source Code | Yes | Code is available at: https://github.com/FF93/Parameter-based-Value-Functions |
| Open Datasets | Yes | For this purpose, we use an instance of the 1D Linear Quadratic Regulator (LQR) problem and Swimmer-v3 and Hopper-v3; 100k time steps for all other environments. |
| Dataset Splits | Yes | For each hyperparameter configuration, for each environment and policy architecture, we run 5 instances of the learning algorithm using different seeds. We measure the learning progress by running 100 evaluations while learning the deterministic policy (without action or parameter noise) using 10 test trajectories. We use two metrics to determine the best hyperparameters: the average return over policy evaluations during the whole training process and the average return over policy evaluations during the last 20% time steps. |
| Hardware Specification | Yes | We also thank NVIDIA Corporation for donating a DGX-1 as part of the Pioneers of AI Research Award and to IBM for donating a Minsky machine. |
| Software Dependencies | No | The paper mentions 'Pytorch initialization' but does not specify software names with version numbers for dependencies. |
| Experiment Setup | Yes | A.3 IMPLEMENTATION DETAILS. Shared hyperparameters: ... Batch size: 128 for DDPG, PSVF, PAVF; 16 for PSSVF. ... Tuned hyperparameters: ... Policy s learning rate: tuned with values in [1e 2, 1e 3, 1e 4]. |