reproducibilityindex.ai

Limits to Depth Efficiencies of Self-Attention

Authors: Yoav Levine, Noam Wies, Or Sharir, Hofit Bata, Amnon Shashua

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct systematic empirical ablations on networks of depths 6 to 48 that clearly reveal the theoretically predicted behaviors
Researcher Affiliation	Academia	Yoav Levine, Noam Wies, Or Sharir, Hoﬁt Bata, and Amnon Shashua The Hebrew University of Jerusalem
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	The interleaved Baseline achieves a perplexity score of 18.63 0.26 on the Wiki Text-103 test [Merity et al., 2016] when averaged over 5 random seeds
Dataset Splits	No	The paper mentions using the WikiText-103 test set but does not provide specific details on the training, validation, or test splits (e.g., percentages or sample counts) used for reproducibility. It only mentions 'Wiki Text-103 test'.
Hardware Specification	Yes	Experiments were performed with Cloud TPUs and supported by Google s Tensor Flow Research Cloud (TFRC).
Software Dependencies	No	The paper mentions 'TensorFlow Research Cloud' but does not provide specific version numbers for software dependencies (e.g., TensorFlow version, Python version, CUDA version).
Experiment Setup	No	The paper states 'The training apparatus details are given in the appendix' but does not provide specific experimental setup details (like hyperparameters or training configurations) in the main text.