Limits to Depth Efficiencies of Self-Attention

Authors: Yoav Levine, Noam Wies, Or Sharir, Hofit Bata, Amnon Shashua

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct systematic empirical ablations on networks of depths 6 to 48 that clearly reveal the theoretically predicted behaviors
Researcher Affiliation Academia Yoav Levine, Noam Wies, Or Sharir, Hofit Bata, and Amnon Shashua The Hebrew University of Jerusalem
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes The interleaved Baseline achieves a perplexity score of 18.63 0.26 on the Wiki Text-103 test [Merity et al., 2016] when averaged over 5 random seeds
Dataset Splits No The paper mentions using the WikiText-103 test set but does not provide specific details on the training, validation, or test splits (e.g., percentages or sample counts) used for reproducibility. It only mentions 'Wiki Text-103 test'.
Hardware Specification Yes Experiments were performed with Cloud TPUs and supported by Google s Tensor Flow Research Cloud (TFRC).
Software Dependencies No The paper mentions 'TensorFlow Research Cloud' but does not provide specific version numbers for software dependencies (e.g., TensorFlow version, Python version, CUDA version).
Experiment Setup No The paper states 'The training apparatus details are given in the appendix' but does not provide specific experimental setup details (like hyperparameters or training configurations) in the main text.