Interval timing in deep reinforcement learning agents
Authors: Ben Deverett, Ryan Faulkner, Meire Fortunato, Gregory Wayne, Joel Z. Leibo
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here we studied interval timing abilities in deep reinforcement learning agents trained end-to-end on an interval reproduction paradigm inspired by experimental literature on mechanisms of timing. We characterize the strategies developed by recurrent and feedforward agents, which both succeed at temporal reproduction using distinct mechanisms, some of which bear specific and intriguing similarities to biological systems. These findings advance our understanding of how agents come to represent time, and they highlight the value of experimentally inspired approaches to characterizing agent abilities. |
| Researcher Affiliation | Industry | Ben Deverett Deep Mind bendeverett@google.com Ryan Faulkner Deep Mind rfaulk@google.com Meire Fortunato Deep Mind meirefortunato@google.com Greg Wayne Deep Mind gregwayne@google.com Joel Z. Leibo Deep Mind jzl@google.com |
| Pseudocode | No | The paper describes the agent architecture and experimental procedures but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We have open-sourced the task (along with other related timing tasks) for use in future work. Available at https://github.com/deepmind/lab/tree/master/game_scripts/levels/ contributed/psychlab |
| Open Datasets | No | The paper uses a simulated environment ('Psych Lab') to generate data for training, but it does not provide access information for a pre-collected, publicly available dataset in the traditional sense. |
| Dataset Splits | No | The paper does not specify explicit train/validation/test splits of a pre-existing dataset. Training occurs in a simulated environment, and generalization is assessed on new sample intervals, not a fixed validation set split. |
| Hardware Specification | No | The paper mentions the use of deep neural networks and reinforcement learning agents, but it does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | The paper lists various software components and architectures used (e.g., Psych Lab, A3C, Scipy, Adam) but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | We used controllers with 128 hidden units for all experiments. The learner was given trajectories of 100 frames, with a batch size of 32, and used 200 actors. Other parameters were a discount factor of 0.99, baseline cost of 0.5, and entropy cost of 0.01. The model was optimized using Adam with β1 = 0.9, β2 = 0.999, ϵ = 10 4, and a learning rate of 10 5. |