reproducibilityindex.ai

Separating value functions across time-scales

Authors: Joshua Romoff, Peter Henderson, Ahmed Touati, Emma Brunskill, Joelle Pineau, Yann Ollivier

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show theoretic and empirical improvements over standard TD learning in certain settings. [...] 6. Experiments All hyperparameter settings, extended details, and the reproducibility checklist for machine learning research (Pineau, 2018) can be found in the Supplemental.
Researcher Affiliation	Collaboration	1MILA, Mc Gill University 2Facebook AI Research 3Stanford University 4MILA, Universit e de Montr eal. Correspondence to: Joshua Romoff <joshua.romoff@mail.mcgill.ca>, Peter Henderson <phend@stanford.edu>.
Pseudocode	Yes	Algorithm 1 Multi-step TD( ) [...] Algorithm 2 PPO-TD(λ, )
Open Source Code	Yes	Link to Code: github.com/facebookresearch/td-delta
Open Datasets	Yes	We use the same 5-state ring MDP as in Kearns & Singh (2000) a diagram of which is available in the Supplemental for clarity to demonstrate performance gains under decreasing k-step regimes as described in Section 5.1. [...] We run experiments on the 9 games deﬁned in (Bellemare et al., 2016) as Hard with dense rewards.
Dataset Splits	No	The paper describes experiments on Atari games and a 5-state ring MDP, which are environments where agents learn through interaction rather than from fixed datasets with explicit train/validation/test splits. It mentions 'average return across training' and 'hold-out no-op starts' for evaluation, but does not provide specific percentages or counts for dataset splits for reproduction.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running its experiments, such as exact GPU/CPU models, processor types, or memory amounts.
Software Dependencies	No	The paper mentions using PPO (Proximal Policy Optimization) and references 'Pytorch implementations of reinforcement learning algorithms' by Kostrikov (2018), but it does not specify version numbers for any software dependencies required for reproduction.
Experiment Setup	No	The paper states, 'All hyperparameter settings, extended details, and the reproducibility checklist for machine learning research (Pineau, 2018) can be found in the Supplemental'. In the main text, it vaguely mentions using 'standard TD-style updates' and comparing against 'standard PPO baseline with hyperparameters as found in (Schulman et al., 2017; Kostrikov, 2018)', but does not provide specific hyperparameter values or training configurations.