StableMask: Refining Causal Masking in Decoder-only Transformer
Authors: Qingyu Yin, Xuzheng He, Xiang Zhuang, Yu Zhao, Jianhua Yao, Xiaoyu Shen, Qiang Zhang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Stable Mask s effectiveness is validated both theoretically and empirically, showing significant enhancements in language models with parameter sizes ranging from 71M to 1.4B across diverse datasets and encoding methods. |
| Researcher Affiliation | Collaboration | 1Zhejiang University 2Peking University 3Tencent AI Lab 4Eastern Institute of Technology, Ningbo. |
| Pseudocode | Yes | We include a complete formula derivation and pseudocode implementation in Appendix D. Algorithm 1 Forward pass |
| Open Source Code | Yes | The code of this paper is available at https://github. com/Mika Stars39/Stable Mask |
| Open Datasets | Yes | Performance on Wikitext-103 and Mini Pile (Table 1): Empirical evidence underscores the efficacy of models employing Stable Mask when trained on both Wikitext-103 (Merity et al., 2016) and Mini Pile (Kaddour, 2023). |
| Dataset Splits | Yes | Performance on Wikitext-103 and Mini Pile (Table 1): Empirical evidence underscores the efficacy of models employing Stable Mask when trained on both Wikitext-103 (Merity et al., 2016) and Mini Pile (Kaddour, 2023). |
| Hardware Specification | Yes | Our experiments were conducted using a model with 160 million parameters, trained on four V100 GPUs. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software components or libraries used in the experiments. |
| Experiment Setup | Yes | Detail settings could be checked in the Appendix F. Table 6. Hyperparameters for Wiki Text-103 with ALibi and Ro PE positional encoding Table 7. Hyperparameters for Wiki Text-103 with ALibi and Ro PE positional encoding |