Prediction and generalisation over directed actions by grid cells

Authors: Changmin Yu, Timothy Behrens, Neil Burgess

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Preliminary results in Fig. 4(B) show that the gc-DQN and deep gc-Dyna-Q greatly accelerates learning comparing to the baseline agents, with relatively minor increase in the model complexity and computational costs. Evaluation of gc-DQN and the baseline DQN agents in the Cart Pole task (Barto et al. [3]). The evaluations are computed given 5 random seeds.
Researcher Affiliation Academia Changmin Yu1, 2 , Timothy E.J. Behrens3, 4, Neil Burgess1, 4 1Institute of Cognitive Neuroscience, UCL, London, UK 2Centre for Artificial Intelligence, UCL, London, UK 3Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, UK 4Sainsbury Wellcome Centre, UCL, London, UK
Pseudocode No The paper describes methods through text and mathematical equations but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes The Python-based implementations can be found at https://github.com/ucabcy/Prediction_and_Generalisation_over_Directed_Actions_by_Grid_Cells.
Open Datasets Yes The environment is the Cart Pole task (Barto et al. [3]), and is simulated using the Open AI gym environment (Brockman et al. [5]).
Dataset Splits No The paper describes episodes for training and evaluation in an RL environment, but does not provide explicit dataset splits for train/validation/test.
Hardware Specification No The paper mentions 'limited time and computational resources' but does not specify any particular hardware (e.g., GPU, CPU models, memory) used for the experiments.
Software Dependencies No All implementations are performed in the Tensor Flow framework (Abadi et al. [1]).
Experiment Setup Yes All models are learnt using the mean squared error loss function and Adam optimiser (Kingma and Ba [24]) with learning rate 0.001 and no learning rate decay. The exploration strength, ϵ, is set to be 0.8 at the start of each independent run, decreases by 0.05 at each episode, and is bounded below by 0.01. A total of 5 independent runs of 100 episodes are performed for each agent.