Separating value functions across time-scales
Authors: Joshua Romoff, Peter Henderson, Ahmed Touati, Emma Brunskill, Joelle Pineau, Yann Ollivier
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show theoretic and empirical improvements over standard TD learning in certain settings. [...] 6. Experiments All hyperparameter settings, extended details, and the reproducibility checklist for machine learning research (Pineau, 2018) can be found in the Supplemental. |
| Researcher Affiliation | Collaboration | 1MILA, Mc Gill University 2Facebook AI Research 3Stanford University 4MILA, Universit e de Montr eal. Correspondence to: Joshua Romoff <joshua.romoff@mail.mcgill.ca>, Peter Henderson <phend@stanford.edu>. |
| Pseudocode | Yes | Algorithm 1 Multi-step TD( ) [...] Algorithm 2 PPO-TD(λ, ) |
| Open Source Code | Yes | Link to Code: github.com/facebookresearch/td-delta |
| Open Datasets | Yes | We use the same 5-state ring MDP as in Kearns & Singh (2000) a diagram of which is available in the Supplemental for clarity to demonstrate performance gains under decreasing k-step regimes as described in Section 5.1. [...] We run experiments on the 9 games defined in (Bellemare et al., 2016) as Hard with dense rewards. |
| Dataset Splits | No | The paper describes experiments on Atari games and a 5-state ring MDP, which are environments where agents learn through interaction rather than from fixed datasets with explicit train/validation/test splits. It mentions 'average return across training' and 'hold-out no-op starts' for evaluation, but does not provide specific percentages or counts for dataset splits for reproduction. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running its experiments, such as exact GPU/CPU models, processor types, or memory amounts. |
| Software Dependencies | No | The paper mentions using PPO (Proximal Policy Optimization) and references 'Pytorch implementations of reinforcement learning algorithms' by Kostrikov (2018), but it does not specify version numbers for any software dependencies required for reproduction. |
| Experiment Setup | No | The paper states, 'All hyperparameter settings, extended details, and the reproducibility checklist for machine learning research (Pineau, 2018) can be found in the Supplemental'. In the main text, it vaguely mentions using 'standard TD-style updates' and comparing against 'standard PPO baseline with hyperparameters as found in (Schulman et al., 2017; Kostrikov, 2018)', but does not provide specific hyperparameter values or training configurations. |