reproducibilityindex.ai

Reusing Pretrained Models by Multi-linear Operators for Efficient Training

Authors: Yu Pan, Ye Yuan, Yichun Yin, Zenglin Xu, Lifeng Shang, Xin Jiang, Qun Liu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate that our method can save 76% computational costs on Dei T-base transferred from Dei T-small, which outperforms bert2BERT by +12.0% and Li GO by +20.7%, respectively. In this section, we design a set of experiments to validate the proposed Mango.
Researcher Affiliation	Collaboration	Harbin Institute of Technology Shenzhen, Shenzhen, Guangdong, China \| Pengcheng Laboratory, Shenzhen, China } Peking University, Beijing, China ~ Huawei Noah s Ark Lab, Shenzhen, Guangdong, China
Pseudocode	No	The paper describes 'Procedures of Applying Mango' in Section 3.2 using numbered steps in prose, but it is not a formally structured pseudocode or algorithm block.
Open Source Code	No	The paper does not provide any statement or link indicating that its source code is publicly available.
Open Datasets	Yes	We use three tiny vision Transformers (Vi Ts) [11], i.e., Dei T-T-A, Dei T-T-B, and Dei T-T-C, for growing to Dei T-S [46] on Image Net [9]... The dataset is the concatenation of English Wikipedia and Toronto Book Corpus [71]... We show the effectiveness of Mango on SQu AD and GLUE benchmark as in Table 3. To investigate the inﬂuence of Mango on transferring ability, we also conduct an experiment on downstream tasks, including CIFAR10 [26], CIFAR100 [26], Flowers [31], Cars [25], and Chest XRay8 [54].
Dataset Splits	No	The paper does not explicitly specify dataset split percentages (e.g., 80/10/10) or methodology for training, validation, and test sets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only general mentions of training models.
Software Dependencies	No	The paper mentions optimizers like 'Adam' and 'Adam W' but does not specify version numbers for any software frameworks, libraries, or tools used (e.g., PyTorch 1.9).
Experiment Setup	Yes	We train Mango operators for 100 steps... We use Adam with learning rate 1e-3 and weight decay 1e-2 for 300 epoch optimization. The batch size is 1024. The training epoch is 40. The batch size is 768. ... The optimizer is set to Adam W. The learning rate is 1e-4 and the weight decay is 1e-2. The training epoch is 35. The batch size is 512.