Limits to Depth Efficiencies of Self-Attention
Authors: Yoav Levine, Noam Wies, Or Sharir, Hofit Bata, Amnon Shashua
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct systematic empirical ablations on networks of depths 6 to 48 that clearly reveal the theoretically predicted behaviors |
| Researcher Affiliation | Academia | Yoav Levine, Noam Wies, Or Sharir, Hofit Bata, and Amnon Shashua The Hebrew University of Jerusalem |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | The interleaved Baseline achieves a perplexity score of 18.63 0.26 on the Wiki Text-103 test [Merity et al., 2016] when averaged over 5 random seeds |
| Dataset Splits | No | The paper mentions using the WikiText-103 test set but does not provide specific details on the training, validation, or test splits (e.g., percentages or sample counts) used for reproducibility. It only mentions 'Wiki Text-103 test'. |
| Hardware Specification | Yes | Experiments were performed with Cloud TPUs and supported by Google s Tensor Flow Research Cloud (TFRC). |
| Software Dependencies | No | The paper mentions 'TensorFlow Research Cloud' but does not provide specific version numbers for software dependencies (e.g., TensorFlow version, Python version, CUDA version). |
| Experiment Setup | No | The paper states 'The training apparatus details are given in the appendix' but does not provide specific experimental setup details (like hyperparameters or training configurations) in the main text. |