Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
StableMask: Refining Causal Masking in Decoder-only Transformer
Authors: Qingyu Yin, Xuzheng He, Xiang Zhuang, Yu Zhao, Jianhua Yao, Xiaoyu Shen, Qiang Zhang
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Stable Mask s effectiveness is validated both theoretically and empirically, showing significant enhancements in language models with parameter sizes ranging from 71M to 1.4B across diverse datasets and encoding methods. |
| Researcher Affiliation | Collaboration | 1Zhejiang University 2Peking University 3Tencent AI Lab 4Eastern Institute of Technology, Ningbo. |
| Pseudocode | Yes | We include a complete formula derivation and pseudocode implementation in Appendix D. Algorithm 1 Forward pass |
| Open Source Code | Yes | The code of this paper is available at https://github. com/Mika Stars39/Stable Mask |
| Open Datasets | Yes | Performance on Wikitext-103 and Mini Pile (Table 1): Empirical evidence underscores the efficacy of models employing Stable Mask when trained on both Wikitext-103 (Merity et al., 2016) and Mini Pile (Kaddour, 2023). |
| Dataset Splits | Yes | Performance on Wikitext-103 and Mini Pile (Table 1): Empirical evidence underscores the efficacy of models employing Stable Mask when trained on both Wikitext-103 (Merity et al., 2016) and Mini Pile (Kaddour, 2023). |
| Hardware Specification | Yes | Our experiments were conducted using a model with 160 million parameters, trained on four V100 GPUs. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software components or libraries used in the experiments. |
| Experiment Setup | Yes | Detail settings could be checked in the Appendix F. Table 6. Hyperparameters for Wiki Text-103 with ALibi and Ro PE positional encoding Table 7. Hyperparameters for Wiki Text-103 with ALibi and Ro PE positional encoding |