Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training
Authors: Shiwei Liu, Lu Yin, Decebal Constantin Mocanu, Mykola Pechenizkiy
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present a series of experiments to support our conjecture and achieve the state-of-the-art sparse training performance with Res Net-50 on Image Net. More impressively, ITOP achieves dominant performance over the overparameterization-based sparse methods at extreme sparsities. When trained with Res Net-34 on CIFAR-100, ITOP can match the performance of the dense model at an extreme sparsity of 98%. |
| Researcher Affiliation | Academia | 1Department of Mathematics and Computer Science, Eindhoven University of Technology, 5600 MB Eindhoven, the Netherlands 2Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, Enschede 7522NB,The Netherlands . |
| Pseudocode | No | No explicit pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | 1https://github.com/Shiweiliuiiiiiii/ In-Time-Over-Parameterization |
| Open Datasets | Yes | We study Multi-layer Perceptron (MLP) on CIFAR-10, VGG-16 on CIFAR-10, Res Net-34 on CIFAR-100, and Res Net-50 on Image Net. |
| Dataset Splits | No | The paper mentions 'minimum validation loss function' in Section 3.1 but does not provide specific details on the dataset splits (e.g., percentages or sample counts) used for training, validation, or testing in the main text. It refers to Appendix A for experimental details, but these details are not explicitly present in the main body. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper states 'We use Py Torch as our library.' in Section 3.2, but does not specify its version or any other software dependencies with version numbers. |
| Experiment Setup | Yes | We train MLP, VGG-16, and Res Net-34 with various T and report the test accuracy. ... We train MLP, VGG-16, and Res Net-34 for an extended training time with a large T. We safely choose T as 1500 for MLPs, 2000 for VGG-16, and 1000 for Res Net-34... In addition to the training time, the anchor points of the learning rate schedule are also scaled by the same factor. ... More precisely, we choose an update interval T of 4000, a batch size of 64, and an initial pruning rate of 0.5... |