reproducibilityindex.ai

Interval timing in deep reinforcement learning agents

Authors: Ben Deverett, Ryan Faulkner, Meire Fortunato, Gregory Wayne, Joel Z. Leibo

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here we studied interval timing abilities in deep reinforcement learning agents trained end-to-end on an interval reproduction paradigm inspired by experimental literature on mechanisms of timing. We characterize the strategies developed by recurrent and feedforward agents, which both succeed at temporal reproduction using distinct mechanisms, some of which bear speciﬁc and intriguing similarities to biological systems. These ﬁndings advance our understanding of how agents come to represent time, and they highlight the value of experimentally inspired approaches to characterizing agent abilities.
Researcher Affiliation	Industry	Ben Deverett Deep Mind bendeverett@google.com Ryan Faulkner Deep Mind rfaulk@google.com Meire Fortunato Deep Mind meirefortunato@google.com Greg Wayne Deep Mind gregwayne@google.com Joel Z. Leibo Deep Mind jzl@google.com
Pseudocode	No	The paper describes the agent architecture and experimental procedures but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	We have open-sourced the task (along with other related timing tasks) for use in future work. Available at https://github.com/deepmind/lab/tree/master/game_scripts/levels/ contributed/psychlab
Open Datasets	No	The paper uses a simulated environment ('Psych Lab') to generate data for training, but it does not provide access information for a pre-collected, publicly available dataset in the traditional sense.
Dataset Splits	No	The paper does not specify explicit train/validation/test splits of a pre-existing dataset. Training occurs in a simulated environment, and generalization is assessed on new sample intervals, not a fixed validation set split.
Hardware Specification	No	The paper mentions the use of deep neural networks and reinforcement learning agents, but it does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies	No	The paper lists various software components and architectures used (e.g., Psych Lab, A3C, Scipy, Adam) but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	We used controllers with 128 hidden units for all experiments. The learner was given trajectories of 100 frames, with a batch size of 32, and used 200 actors. Other parameters were a discount factor of 0.99, baseline cost of 0.5, and entropy cost of 0.01. The model was optimized using Adam with β1 = 0.9, β2 = 0.999, ϵ = 10 4, and a learning rate of 10 5.