Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Limits to Depth Efficiencies of Self-Attention
Authors: Yoav Levine, Noam Wies, Or Sharir, Hofit Bata, Amnon Shashua
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct systematic empirical ablations on networks of depths 6 to 48 that clearly reveal the theoretically predicted behaviors |
| Researcher Affiliation | Academia | Yoav Levine, Noam Wies, Or Sharir, Hofit Bata, and Amnon Shashua The Hebrew University of Jerusalem |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | The interleaved Baseline achieves a perplexity score of 18.63 0.26 on the Wiki Text-103 test [Merity et al., 2016] when averaged over 5 random seeds |
| Dataset Splits | No | The paper mentions using the WikiText-103 test set but does not provide specific details on the training, validation, or test splits (e.g., percentages or sample counts) used for reproducibility. It only mentions 'Wiki Text-103 test'. |
| Hardware Specification | Yes | Experiments were performed with Cloud TPUs and supported by Google s Tensor Flow Research Cloud (TFRC). |
| Software Dependencies | No | The paper mentions 'TensorFlow Research Cloud' but does not provide specific version numbers for software dependencies (e.g., TensorFlow version, Python version, CUDA version). |
| Experiment Setup | No | The paper states 'The training apparatus details are given in the appendix' but does not provide specific experimental setup details (like hyperparameters or training configurations) in the main text. |