reproducibilityindex.ai

Reducing Sampling Error in Batch Temporal Difference Learning

Authors: Brahma Pavse, Ishan Durugkar, Josiah Hanna, Peter Stone

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we conduct an empirical evaluation of PSEC-TD(0) on three batch value function learning tasks, with a hyperparameter sensitivity analysis, and show that PSEC-TD(0) produces value function estimates with lower mean squared error than TD(0).
Researcher Affiliation	Collaboration	1The University of Texas at Austin 2School of Informatics, University of Edinburgh 3To be joining the Computer Sciences Department, University of Wisconsin Madison 4Sony AI.
Pseudocode	Yes	Algorithm 1 Batch Linear TD(0) to estimate v e
Open Source Code	No	The paper does not contain any explicit statements or links about providing open-source code for the described methodology.
Open Datasets	No	The paper mentions standard reinforcement learning domains like Gridworld, Cart Pole, and Inverted Pendulum. However, it does not provide concrete access information (e.g., specific links, DOIs, or citations with author/year) for publicly available datasets used in the experiments. For Cart Pole and Inverted Pendulum, it describes generating data via 'Monte Carlo rollouts'.
Dataset Splits	Yes	In all PSEC training settings, PSEC performs gradient steps using the full batch of data, uses a separate batch of data as the validation data, and terminates training according to early stopping.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only mentions general terms like 'trained our models'.
Software Dependencies	No	The paper mentions 'Openai gym', 'MuJoCo', 'Adam', and 'PPO' but does not provide specific version numbers for these software components.
Experiment Setup	Yes	In all experiments, the value function learning algorithm iterates over the whole batch of data until convergence, after which the MSVE of the ﬁnal value function is computed. Some experiments include a parameter sweep over the hyperparameters, which can be found in Appendix G. ... The results shown here are with sweeps over only the value function model class and PSEC learning rate.