Prediction and generalisation over directed actions by grid cells
Authors: Changmin Yu, Timothy Behrens, Neil Burgess
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Preliminary results in Fig. 4(B) show that the gc-DQN and deep gc-Dyna-Q greatly accelerates learning comparing to the baseline agents, with relatively minor increase in the model complexity and computational costs. Evaluation of gc-DQN and the baseline DQN agents in the Cart Pole task (Barto et al. [3]). The evaluations are computed given 5 random seeds. |
| Researcher Affiliation | Academia | Changmin Yu1, 2 , Timothy E.J. Behrens3, 4, Neil Burgess1, 4 1Institute of Cognitive Neuroscience, UCL, London, UK 2Centre for Artificial Intelligence, UCL, London, UK 3Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, UK 4Sainsbury Wellcome Centre, UCL, London, UK |
| Pseudocode | No | The paper describes methods through text and mathematical equations but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The Python-based implementations can be found at https://github.com/ucabcy/Prediction_and_Generalisation_over_Directed_Actions_by_Grid_Cells. |
| Open Datasets | Yes | The environment is the Cart Pole task (Barto et al. [3]), and is simulated using the Open AI gym environment (Brockman et al. [5]). |
| Dataset Splits | No | The paper describes episodes for training and evaluation in an RL environment, but does not provide explicit dataset splits for train/validation/test. |
| Hardware Specification | No | The paper mentions 'limited time and computational resources' but does not specify any particular hardware (e.g., GPU, CPU models, memory) used for the experiments. |
| Software Dependencies | No | All implementations are performed in the Tensor Flow framework (Abadi et al. [1]). |
| Experiment Setup | Yes | All models are learnt using the mean squared error loss function and Adam optimiser (Kingma and Ba [24]) with learning rate 0.001 and no learning rate decay. The exploration strength, ϵ, is set to be 0.8 at the start of each independent run, decreases by 0.05 at each episode, and is bounded below by 0.01. A total of 5 independent runs of 100 episodes are performed for each agent. |