Actionable Neural Representations: Grid Cells from Minimal Constraints

Authors: Will Dorrell, Peter E. Latham, Timothy E. J. Behrens, James C. R. Whittington

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We support this claim with intuition, analytic justification, and simulations.Our work could be seen as extracting and simplifying the key ideas from these papers that make hexagonal grids optimal (see Appendix H), and extending them to multiple modules, something both papers had to hard code. Biological: Lastly, both theoretically (Sorscher et al., 2019) and computationally (Dordek et al., 2016; Whittington et al., 2021), non-negativity has played a key role in normative derivations of hexagonal grid cells, as it will here.B Optimising among actionable codes to achieve functional and biological constraints produces multiple modules of hexagonal grid cells.Simulations results confirm this heuristic in C 1D and D 2D,We validate our arguments by numerically optimising the coefficients a0, {ad, bd}D d=1 and frequencies {nd}D d=1 to minimise L0 subject to constraints, producing a module of lattices (Figure 3C; details in Appendix B).
Researcher Affiliation Academia William Dorrell, Peter Latham Gatsby Unit, UCL dorrellwec@gmail.com Timothy E.J. Behrens UCL & Oxford James C.R. Whittington Oxford & Stanford jcrwhittington@gmail.com
Pseudocode No We minimise the full loss over the parameters (a0, {ad, bd, ωd}D d=1) using a gradient-based algorithm, ADAM (Kingma & Ba, 2014). We initialise these parameters by sampling from independent zero-mean gaussians, with variances as in table B.3.
Open Source Code Yes All code is available on github: https://github.com/Wilbur Doz/ICLR_Actionable_Reps.
Open Datasets No To evaluate the functional component of our loss we sample a set of points {xi}M i=1 from p(x) and calculate their representations {g(xi)}M i=1.
Dataset Splits No As such, we sample a second set of MS shift positions , {xm}MS m=1, from a scaled up version of p(x), using a scale factor S. We then create a much larger set of positions, by shifting the original set by each of the shift positions , creating a dataset of size M MS, and use these to calculate our two constraint losses.
Hardware Specification No No specific hardware details are provided in the paper.
Software Dependencies No We minimise the full loss over the parameters (a0, {ad, bd, ωd}D d=1) using a gradient-based algorithm, ADAM (Kingma & Ba, 2014). We initialise these parameters by sampling from independent zero-mean gaussians, with variances as in table B.3.We thank Pierre Glaser for teaching the ways of numpy array broadcasting.The real Wigner D-Matrices were calculated recursively as detailed in Ivanic & Ruedenberg (1996; 1998), by translating matlab code from Politis et al. (2016) into python.
Experiment Setup Yes We list the parameter values used to generate the grid cells in Figure 2B, and show the full population of neurons in figure 6. Parameter Meaning Value σ neural lengthscale 0.2 l χ lengthscale 0.5 T number of gradient steps 150000 N number of neurons 64 M number of sampled points every nresample steps 150 MS number of room shifts sampled every nresample steps 15 S standard deviation of normal for shift sampling 3 nresample number of steps per resample of points 5 λp0 initial positivity weighting coefficient 0.1 kp log positivity target -9 αp positivity target smoothing 0.9 γp positivity coefficient dynamics coefficient 0.0001 λn0 same set of GECO parameters for norm constraint 0.005 kn ditto 4 αn ditto 0.9 γn ditto 0.0001 ϵw, coefficient gradient step size 0.1 ϵom frequency gradient step size 0.1 β1 exponential moving average of first gradient moment 0.9 β2 exponential moving average of second moment 0.9 η ADAM non-exploding term 1 10 8