Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training

Authors: Shiwei Liu, Lu Yin, Decebal Constantin Mocanu, Mykola Pechenizkiy

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present a series of experiments to support our conjecture and achieve the state-of-the-art sparse training performance with Res Net-50 on Image Net. More impressively, ITOP achieves dominant performance over the overparameterization-based sparse methods at extreme sparsities. When trained with Res Net-34 on CIFAR-100, ITOP can match the performance of the dense model at an extreme sparsity of 98%.
Researcher Affiliation Academia 1Department of Mathematics and Computer Science, Eindhoven University of Technology, 5600 MB Eindhoven, the Netherlands 2Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, Enschede 7522NB,The Netherlands .
Pseudocode No No explicit pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes 1https://github.com/Shiweiliuiiiiiii/ In-Time-Over-Parameterization
Open Datasets Yes We study Multi-layer Perceptron (MLP) on CIFAR-10, VGG-16 on CIFAR-10, Res Net-34 on CIFAR-100, and Res Net-50 on Image Net.
Dataset Splits No The paper mentions 'minimum validation loss function' in Section 3.1 but does not provide specific details on the dataset splits (e.g., percentages or sample counts) used for training, validation, or testing in the main text. It refers to Appendix A for experimental details, but these details are not explicitly present in the main body.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper states 'We use Py Torch as our library.' in Section 3.2, but does not specify its version or any other software dependencies with version numbers.
Experiment Setup Yes We train MLP, VGG-16, and Res Net-34 with various T and report the test accuracy. ... We train MLP, VGG-16, and Res Net-34 for an extended training time with a large T. We safely choose T as 1500 for MLPs, 2000 for VGG-16, and 1000 for Res Net-34... In addition to the training time, the anchor points of the learning rate schedule are also scaled by the same factor. ... More precisely, we choose an update interval T of 4000, a batch size of 64, and an initial pruning rate of 0.5...