reproducibilityindex.ai

Learning Associative Memories with Gradient Descent

Authors: Vivien Cabannes, Berfin Simsek, Alberto Bietti

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through theory and experiments, we provide several insights. ... We complement our analysis with experiments, investigating small multi-layer Transformer models with our associative memory viewpoint and identifying similar behaviors to those pinpointed in the simpler models.
Researcher Affiliation	Collaboration	1Meta AI 2Flatiron. Correspondence to: <vivc@meta.com>.
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets	No	We consider full-batch gradient descent on a dataset of 16 384 sequences of length 256 generated from the model described above with N = 64 tokens. ... The tokens following all non-trigger tokens are randomly sampled from a sequence-independent Markov model (namely, a character-level bigram model estimated from Shakespeare text data). The dataset is generated by the authors, and no specific link, DOI, or citation to a publicly available instance of this generated dataset is provided.
Dataset Splits	No	We consider full-batch gradient descent on a dataset of 16 384 sequences of length 256 generated from the model described above with N = 64 tokens. No explicit training, validation, or test dataset splits are provided.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory specifications) used to run the experiments.
Software Dependencies	No	The paper mentions 'pytorch convention' in Appendix A but does not list any specific software or library names with version numbers required for reproducibility.
Experiment Setup	Yes	We consider full-batch gradient descent on a dataset of 16 384 sequences of length 256 generated from the model described above with N = 64 tokens. ... Training losses are shown for different step-sizes η, and margins are shown for 5 different tokens. ... In Figure 6, we consider a setup with N = M = 5, f (x) = x, and p(x) 1/x, in different dimensions (with random embeddings).