Gamma-Nets: Generalizing Value Estimation over Timescale
Authors: Craig Sherstan, Shibhansh Dohare, James MacGlashan, Johannes Günther, Patrick M. Pilarski5717-5725
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first provide two demonstrations by 1) predicting a square wave and 2) predicting sensorimotor signals on a robot arm using a linear function approximator. Next, we empirically evaluate Γ-nets in the deep reinforcement learning setting using policy evaluation on a set of Atari video games. |
| Researcher Affiliation | Collaboration | 1Department of Computing Science, University of Alberta Edmonton, Alberta, Canada 2Cogitai, USA |
| Pseudocode | No | The paper does not include any figure, block, or section labeled 'Pseudocode' or 'Algorithm', nor are there structured steps formatted like code. |
| Open Source Code | No | The paper mentions 'Additional results and experimental details are available from Sherstan et al. (2019).' which cites an arXiv preprint (1911.07794). This does not constitute an explicit statement of code release or a direct link to a code repository for the methodology. |
| Open Datasets | Yes | We examined the performance of Γ-nets under policy evaluation in the Arcade Learning Environment (ALE) (Bellemare et al. 2015). |
| Dataset Splits | No | The paper describes how 'evaluation points' were created and used to compute returns, and how the models were trained, but it does not specify explicit training, validation, and test dataset splits with percentages or counts, or refer to predefined splits for reproducibility. |
| Hardware Specification | No | The paper describes the software components and training duration (e.g., 'Rainbow agent', 'trained for 25 million frames'), but it does not specify any hardware details such as GPU/CPU models or types of computational resources used for running the experiments. |
| Software Dependencies | No | The paper mentions the use of specific agents and frameworks (e.g., 'Dopamine project’s implementation of the Rainbow agent', 'DQN agent', 'prioritized replay', 'n-step returns', 'distributional representation of the value estimates'), but it does not provide specific version numbers for any key software components or libraries (e.g., Python, TensorFlow, PyTorch versions). |
| Experiment Setup | Yes | The Γ-net network consisted of five fully-connected layers of sizes [512, 256, 128, 16, 1], with all but the final layer using Re LU activation. ... Each network was trained for 20 M frames... A Γt of size 8 was used, which always included lower and upper bounds of τ = [1, 100]. An additional 6 γk were drawn on each timestep. Unless otherwise stated the sampling was done by drawing 3 timescales uniformly each from the γ scale on [0, 0.99) and the τ scale on [1, 100) (for τ we drew from the integer scales). |