reproducibilityindex.ai

Attention Approximates Sparse Distributed Memory

Authors: Trenton Bricken, Cengiz Pehlevan

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conﬁrm that these conditions are satisﬁed in pre-trained GPT2 Transformer models. We test it in pre-trained GPT2 Transformer models [3] (Section 3) and simulations (Appendix B.7). We use the Query-Key Normalized Transformer variant [22] to directly show that the relationship to SDM holds well. We then use original GPT2 models to help conﬁrm this result and make it more general. We analyze the β coefﬁcients learnt by the Query-Key Normalization Transformer Attention variant [22].
Researcher Affiliation	Academia	Trenton Bricken Systems, Synthetic and Quantitative Biology Harvard University trentonbricken@g.harvard.edu Cengiz Pehlevan Applied Mathematics Harvard University cpehlevan@seas.harvard.edu
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	The code for running these experiments, other analyses, and reproducing all ﬁgures is available at https: //github.com/trentbrick/attention-approximates-sdm.
Open Datasets	Yes	We test it in pre-trained GPT2 Transformer models [3] (Section 3) and simulations (Appendix B.7). We use the Query-Key Normalized Transformer variant [22] to directly show that the relationship to SDM holds well. We then use original GPT2 models to help conﬁrm this result and make it more general. We analyze the β coefﬁcients learnt by the Query-Key Normalization Transformer Attention variant [22]. (References [3] and [22] point to publicly recognized models and tasks.)
Dataset Splits	No	The paper mentions using pre-trained models and translation tasks but does not specify train/validation/test dataset splits for its own experiments or analysis.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments or simulations.
Software Dependencies	No	We also would like to thank the open source software contributors that helped make this research possible, including but not limited to: Numpy, Pandas, Scipy, Matplotlib, Py Torch, Hugging Face, and Anaconda.
Experiment Setup	Yes	We test it in pre-trained GPT2 Transformer models [3] (Section 3) and simulations (Appendix B.7). We test random and correlated patterns in an autoassociative retrieval task across different numbers of neurons and SDM variants (Appendix B.7). These variants include SDM implemented using simulated neurons and the Attention approximation with a ﬁtted β.