reproducibilityindex.ai

Random Masking Finds Winning Tickets for Parameter Efficient Fine-tuning

Authors: Jing Xu, Jingzhao Zhang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide both empirical and theoretical explorations into the success of Random Masking. and This section presents the empirical findings of Random Masking.
Researcher Affiliation	Academia	1Institute for Interdisciplinary Information Sciences, Tsinghua University, China 2Shanghai Qizhi Institute 3Shanghai AI Laboratory. Correspondence to: Jing Xu <xujing21@mails.tsinghua.edu.cn>, Jingzhao Zhang <jingzhaoz@mail.tsinghua.edu.cn>.
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	Code is available at https://github.com/JingXuTHU/Random-Masking-Finds-Winning-Tickets-for-Parameter-Efficient-Fine-tuning.
Open Datasets	Yes	We conduct the experiments on a diverse range of datasets and tasks, including 8 datasets in the Super GLUE benchmark (Wang et al., 2019) and three additional datasets.
Dataset Splits	Yes	In line with the approach in Malladi et al. (2023a), we randomly sample 1000 data points from each dataset’s original training split for training, 500 data points for validation, and randomly sample 1000 data points from its original validation split for testing.
Hardware Specification	No	No specific hardware details (like GPU models, CPU types, or cloud instance specifications) used for running the experiments were provided in the paper.
Software Dependencies	No	The paper mentions using 'the spops library' and 'The Adam W optimizer' but does not specify version numbers for these or any other software dependencies.
Experiment Setup	Yes	We choose the Adam W optimizer with β1 = 0.9, β2 = 0.999, ε = 1e 8. We perform a grid search of learning rate from {1e 1, 1e 2, 1e 3, 1e 4, 1e 5, 1e 6}. We follow the practice of Malladi et al. (2023a) and Dettmers et al. (2023), and use a constant learning rate schedule. The number of training epochs is set to 5. The batch size is set to 8 per GPU.