reproducibilityindex.ai

Train Faster, Perform Better: Modular Adaptive Training in Over-Parameterized Models

Authors: Yubin Shi, Yixuan Chen, Mingzhi Dong, Xiaochen Yang, Dongsheng Li, Yujiang Wang, Robert Dick, Qin Lv, Yingying Zhao, Fan Yang, Tun Lu, Ning Gu, Li Shang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that MAT nearly halves the computational cost of model training and outperforms the accuracy of baselines.
Researcher Affiliation	Collaboration	1China and Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University 2School of Mathematics Statistics, University of Glasgow 3Microsoft Research Asia, Shanghai, China 4Department of Engineering Science, University of Oxford 5Department of Electrical Engineering and Computer Science, University of Michigan 6Department of Computer Science, University of Colorado Boulder 7School of Microelectronics, Fudan University
Pseudocode	Yes	Algorithm 1 Modular Adaptive Training
Open Source Code	No	The paper refers to third-party tools' GitHub repositories (e.g., 'https://github.com/pnnl/torchntk' and 'https://github.com/microsoft/Deep Speed') that they utilized for their experiments, but it does not provide a link to their own source code for the methodology described in the paper.
Open Datasets	Yes	Following the basic setup of Liu et al. (2019), we train BERT from scratch by masked language modeling (MLM) task on Wiki Text-2 (Merity et al., 2016). ... train Switch-Transformers using vanilla, Multirate and Switch-Rand training methods on Wiki Text-103 (Merity et al., 2016). ... We take classic convolutional network VGG16 as an example, which is over-parameterized for the CIFAR-10 dataset.
Dataset Splits	No	The paper mentions 'validation loss' and 'validation perplexity' in the results, implying the use of a validation set. However, it does not provide explicit details about the specific training/validation/test split ratios, sample counts, or a citation to a predefined split used in their experiments for reproducibility.
Hardware Specification	Yes	All experiments are conducted on 8 NVIDIA Ge Force RTX 3090 GPUs.
Software Dependencies	No	The paper mentions leveraging 'the implementation of Engel et al. (2022)2' (torchntk) and measuring FLOPs using 'the Deep Speed Profiler3' but does not specify version numbers for these or any other software dependencies.
Experiment Setup	Yes	All experiments are conducted on 8 NVIDIA Ge Force RTX 3090 GPUs. For further experimental details, please refer to the Appendix. ... Table 4: Hyperparameters configuration in BERT and Switch-Transformer.