Hopfield Networks is All You Need

Authors: Hubert Ramsauer, Bernhard Schäfl, Johannes Lehner, Philipp Seidl, Michael Widrich, Lukas Gruber, Markus Holzleitner, Thomas Adler, David Kreil, Michael K Kopp, Günter Klambauer, Johannes Brandstetter, Sepp Hochreiter

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the broad applicability of the Hopfield layers across various domains. Hopfield layers improved state-of-art on three out of four considered multiple instance learning problems as well as on immune repertoire classification with several hundreds of thousands of instances. On the UCI benchmark collections of small classification tasks, where deep learning methods typically struggle, Hopfield layers yielded a new state-ofthe-art when compared to different machine learning methods. Finally, Hopfield layers achieved state-of-the-art on two drug design datasets.
Researcher Affiliation Academia Hubert Ramsauer Bernhard Schäfl Johannes Lehner Philipp Seidl Michael Widrich Thomas Adler Lukas Gruber Markus Holzleitner David Kreil Michael Kopp Günter Klambauer Johannes Brandstetter Sepp Hochreiter , ELLIS Unit Linz, LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria Institute of Advanced Research in Artificial Intelligence (IARAI) Email: {ramsauer,schaefl,brandstetter,hochreit}@ml.jku.at
Pseudocode No The paper provides mathematical formulations and derivations but does not include structured pseudocode or algorithm blocks (e.g., labeled "Pseudocode" or "Algorithm").
Open Source Code Yes The implementation is available at: https://github.com/ml-jku/hopfield-layers
Open Datasets Yes On the UCI benchmark collections of small classification tasks, where deep learning methods typically struggle, Hopfield layers yielded a new state-ofthe-art when compared to different machine learning methods.
Dataset Splits Yes All models were trained for 100 epochs with a mini-batch size of 4 samples using the cross entropy loss and the Py Torch SGD module for stochastic gradient descent without momentum and without weight decay or dropout. After each epoch, the model accuracy was computed on a separated validation set. Using early stopping, the model with the best validation set accuracy averaged over 16 consecutive epochs was selected as final model.
Hardware Specification Yes The training of such a BERT-small model for 1.45 million update steps takes roughly four days on a single NVIDIA V100 GPU.
Software Dependencies No The paper mentions software like "Hugging Face Inc." and "Py Torch", but does not specify exact version numbers for these or other libraries/packages.
Experiment Setup Yes For the MIL datasets... Among other hyperparameters, different hidden layer widths (for the fully connected pre- and post-Hopfield Pooling layers), learning rates and batch sizes were tried. Additionally our focus resided on the hyperparameters of the Hopfield Pooling layer. Among those were the number of heads, the head dimension and the scaling factor β. All models were trained for 160 epochs using the Adam W optimizer (Loshchilov & Hutter, 2017) with exponential learning rate decay (see Table A.2).