reproducibilityindex.ai

Prediction and generalisation over directed actions by grid cells

Authors: Changmin Yu, Timothy Behrens, Neil Burgess

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Preliminary results in Fig. 4(B) show that the gc-DQN and deep gc-Dyna-Q greatly accelerates learning comparing to the baseline agents, with relatively minor increase in the model complexity and computational costs. Evaluation of gc-DQN and the baseline DQN agents in the Cart Pole task (Barto et al. [3]). The evaluations are computed given 5 random seeds.
Researcher Affiliation	Academia	Changmin Yu1, 2 , Timothy E.J. Behrens3, 4, Neil Burgess1, 4 1Institute of Cognitive Neuroscience, UCL, London, UK 2Centre for Artiﬁcial Intelligence, UCL, London, UK 3Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, UK 4Sainsbury Wellcome Centre, UCL, London, UK
Pseudocode	No	The paper describes methods through text and mathematical equations but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The Python-based implementations can be found at https://github.com/ucabcy/Prediction_and_Generalisation_over_Directed_Actions_by_Grid_Cells.
Open Datasets	Yes	The environment is the Cart Pole task (Barto et al. [3]), and is simulated using the Open AI gym environment (Brockman et al. [5]).
Dataset Splits	No	The paper describes episodes for training and evaluation in an RL environment, but does not provide explicit dataset splits for train/validation/test.
Hardware Specification	No	The paper mentions 'limited time and computational resources' but does not specify any particular hardware (e.g., GPU, CPU models, memory) used for the experiments.
Software Dependencies	No	All implementations are performed in the Tensor Flow framework (Abadi et al. [1]).
Experiment Setup	Yes	All models are learnt using the mean squared error loss function and Adam optimiser (Kingma and Ba [24]) with learning rate 0.001 and no learning rate decay. The exploration strength, ϵ, is set to be 0.8 at the start of each independent run, decreases by 0.05 at each episode, and is bounded below by 0.01. A total of 5 independent runs of 100 episodes are performed for each agent.