reproducibilityindex.ai

Outlier-Efficient Hopfield Layers for Large Transformer-Based Models

Authors: Jerry Yao-Chieh Hu, Pei-Hsuan Chang, Haozheng Luo, Hong-Yu Chen, Weijian Li, Wei-Po Wang, Han Liu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we validate the proposed method on 3 common large transformer-based and 1 Hopfield-based models (BERT (Devlin et al., 2019), Open Pre-trained Transformer (OPT) (Zhang et al., 2022), Vision Transformer (Vi T) (Dosovitskiy et al., 2020) and STan Hop-Net (Wu et al., 2024b)). Specifically, Out Eff Hop reduces average kurtosis and maximum infinity norm by 22+% and 26+%, respectively1, and improves the same metrics by an average of 3% and 4% compared to 3 variants of STan Hop-Net and ranks among the top two in outlier efficiency in 25 out of 30 settings.
Researcher Affiliation	Academia	1Department of Computer Science, Northwestern University, Evanston, USA 2Department of Physics, National Taiwan University, Taipei, Taiwan 3Department of Statistics and Data Science, Northwestern University, Evanston, USA.
Pseudocode	No	No pseudocode or algorithm blocks are explicitly presented or labeled in the paper.
Open Source Code	Yes	Code is available at Git Hub; future updates are on ar Xiv.
Open Datasets	Yes	Datasets. We use 4 real-world datasets: Bookcorpus (Zhu et al., 2015), wiki40b/en (Guo et al., 2020), Image Net-1k (Russakovsky et al., 2015) and ETTh1 (Zhou et al., 2021). The first two are for language models, i.e. OPT and BERT, the third is for vision model, i.e. Vi T, and the last is for time series model, i.e. STan Hop-Net.
Dataset Splits	Yes	We then train these models from scratch and evaluate them on the validation set. We divide these datasets into training, validation, and test sets with a ratio of 14/5/5.
Hardware Specification	Yes	Our experimental setup used a Slurm system with two 80G A100 GPUs and a 24-core Intel(R) Xeon(R) Gold 6338 CPU at 2.00GHz.
Software Dependencies	No	No specific software dependencies with version numbers are listed in the paper's main text.
Experiment Setup	Yes	Table 3. Hyperparameter used in the fast convergence task. parameter values learning rate 1e 4 embedding dimension 512 Feed forward dimension 1024 Dropout 0.3 activation function GELU Epoch 100 Batch size 512 Model optimizer Adam Patch size 32