Dense Associative Memory Through the Lens of Random Features

Authors: Benjamin Hoover, Duen Horng Chau, Hendrik Strobelt, Parikshit Ram, Dmitry Krotov

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Empirical evaluation; Figure 3: Dr DAM produces better approximations to the energies and gradients of Mr DAM when the queries are closer to the stored patterns.; Figure 4: A) Retrieval errors predictably follow the approximation quality of fig. 3.
Researcher Affiliation Collaboration Benjamin Hoover IBM Research & Georgia Tech benjamin.hoover@ibm.com Duen Horng Chau Georgia Tech polo@gatech.edu Hendrik Strobelt IBM Research & MIT-IBM hendrik.strobelt@ibm.com Parikshit Ram IBM Research parikshit.ram@ibm.com Dmitry Krotov IBM Research krotov@ibm.com
Pseudocode Yes Algorithm 1: Procedures for Dr DAM with random features.
Open Source Code Yes Experimental code with instructions to replicate the results in this paper are made available at this Git Hub repository (https://github.com/bhoov/distributed_DAM), complete with instructions to setup the coding environment and run all experiments.
Open Datasets Yes Comparing energy descent dynamics between Dr DAM and Mr DAM on 3x64x64 images from Tiny Imagenet [11].; We stored K = 10 random images from CIFAR10 [43] into the memory matrix of Mr DAM
Dataset Splits No We generated 2K = 1000 unique, binary patterns (where each value is normalized to be {0, 1^D}) and stored K = 500 of them into the memory matrix Ξ of Mr DAM. ... The remaining patterns are treated as the random queries xb far... Finally, in addition to evaluating the energy at these random queries and at the stored patterns, we also want to evaluate the energy at queries xb near that are near the stored patterns; thus, we take each stored pattern ξµ and perform bit-flips on 0.1D of its entries. This describes data generation and query types, but not a typical train/validation split.
Hardware Specification Yes All experiments are performed on a single L40s GPU equipped with 46GB VRAM.
Software Dependencies No Experiments were written and performed using the JAX [47] library for tensor manipulations. (JAX is mentioned but no version number is given)
Experiment Setup Yes In performing the qualitative reconstructions shown in fig. 1, we used a standard Mr DAM energy (eq. (7)) configured with inverse temperature β = 60. We approximated this energy in a Dr DAM using the trigonometric Sin Cos basis function shown in eq. (8) configured with feature dimension Y = 1.8e5. The four images shown were selected from the Tiny Imagenet [11] dataset, rasterized into a vector, and stored in the memory matrix a Mr DAM, resulting in a memory of shape (4, 12288). Energy descent for both Mr DAM and Dr DAM used standard gradient descent at a step size of 0.1 until the dynamics of all images converged (for fig. 1 after 300 steps, see energy traces).