Masks, Signs, And Learning Rate Rewinding

Authors: Advait Harshal Gadhikar, Rebekka Burkholz

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To this end, we conduct experiments that disentangle the effect of mask learning and parameter optimization and how both benefit from overparameterization. and In support of this hypothesis, we prove in a simplified single hidden neuron setting that LRR succeeds in more cases than IMP, as it can escape initially problematic sign configurations.
Researcher Affiliation Academia Advait Gadhikar & Rebekka Burkholz CISPA Helmholtz Center for Information Security Saarbr ucken, Germany {advait.gadhikar, burkholz}@cispa.de
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper states 'Our code is built on the work of (Kusupati et al., 2020).' but does not provide a direct link or explicit statement about the availability of their own source code for the methodology described.
Open Datasets Yes To this end, we perform experiments on CIFAR10, CIFAR100 (Krizhevsky, 2009) and Tiny Image Net (Le & Yang, 2015) with Res Net18 or Res Net50 with IMP and LRR that start from the same initializations.
Dataset Splits No The paper refers to 'training data points' and 'training set' but does not provide specific train/validation/test splits, percentages, or sample counts, nor does it reference standard dataset splits.
Hardware Specification Yes All experiments are performed on a Nvidia A100 GPU.
Software Dependencies No The paper mentions 'Optimizer SGD' and 'Init Kaiming Normal', and states 'Our code is built on the work of (Kusupati et al., 2020).', but it does not provide specific version numbers for software libraries or dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes Table 1 details our experimental setup. In each pruning iteration, we keep 80% of the currently remaining parameters of highest magnitude (Frankle & Carbin, 2019). Dataset CIFAR10, CIFAR100, Tiny Image Net, Image Net; Model Res Net18, Res Net50; Epochs 150; LR 0.1; Scheduler cosine-warmup; Batch Size 256; Warmup Epochs 50; Optimizer SGD; Weight Decay 1e-4; Momentum 0.9; Init Kaiming Normal.