Separating value functions across time-scales

Authors: Joshua Romoff, Peter Henderson, Ahmed Touati, Emma Brunskill, Joelle Pineau, Yann Ollivier

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show theoretic and empirical improvements over standard TD learning in certain settings. [...] 6. Experiments All hyperparameter settings, extended details, and the reproducibility checklist for machine learning research (Pineau, 2018) can be found in the Supplemental.
Researcher Affiliation Collaboration 1MILA, Mc Gill University 2Facebook AI Research 3Stanford University 4MILA, Universit e de Montr eal. Correspondence to: Joshua Romoff <joshua.romoff@mail.mcgill.ca>, Peter Henderson <phend@stanford.edu>.
Pseudocode Yes Algorithm 1 Multi-step TD( ) [...] Algorithm 2 PPO-TD(λ, )
Open Source Code Yes Link to Code: github.com/facebookresearch/td-delta
Open Datasets Yes We use the same 5-state ring MDP as in Kearns & Singh (2000) a diagram of which is available in the Supplemental for clarity to demonstrate performance gains under decreasing k-step regimes as described in Section 5.1. [...] We run experiments on the 9 games defined in (Bellemare et al., 2016) as Hard with dense rewards.
Dataset Splits No The paper describes experiments on Atari games and a 5-state ring MDP, which are environments where agents learn through interaction rather than from fixed datasets with explicit train/validation/test splits. It mentions 'average return across training' and 'hold-out no-op starts' for evaluation, but does not provide specific percentages or counts for dataset splits for reproduction.
Hardware Specification No The paper does not provide specific details about the hardware used for running its experiments, such as exact GPU/CPU models, processor types, or memory amounts.
Software Dependencies No The paper mentions using PPO (Proximal Policy Optimization) and references 'Pytorch implementations of reinforcement learning algorithms' by Kostrikov (2018), but it does not specify version numbers for any software dependencies required for reproduction.
Experiment Setup No The paper states, 'All hyperparameter settings, extended details, and the reproducibility checklist for machine learning research (Pineau, 2018) can be found in the Supplemental'. In the main text, it vaguely mentions using 'standard TD-style updates' and comparing against 'standard PPO baseline with hyperparameters as found in (Schulman et al., 2017; Kostrikov, 2018)', but does not provide specific hyperparameter values or training configurations.