reproducibilityindex.ai

Reparameterization through Spatial Gradient Scaling

Authors: Alexander Detkov, Mohammad Salameh, Muhammad Fetrat, Jialin Zhang, Robin Luwei, SHANGLING JUI, Di Niu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on CIFAR-10, CIFAR-100, and Image Net show that without searching for reparameterized structures, our proposed scaling method outperforms the state-of-the-art reparameterization strategies at a lower computational cost. The code is available at https://github.com/Ascend-Research/Reparameterization.
Researcher Affiliation	Collaboration	Alexander Detkov1, , Mohammad Salameh2, , Muhammad Fetrat Qharabagh1,*, , Jialin Zhang3, Wei Lui2, Shangling Jui3, Di Niu1 1University of Alberta, 2Huawei Technologies, 3Huawei Kirin Solutions
Pseudocode	Yes	An overview of the SGS framework is given as pseudo-code in Appendix A.5, and details can be found in the corresponding open-source code.
Open Source Code	Yes	The code is available at https://github.com/Ascend-Research/Reparameterization.
Open Datasets	Yes	Experiments on CIFAR-10, CIFAR-100, and Image Net show that without searching for reparameterized structures, our proposed scaling method outperforms the state-of-the-art reparameterization strategies at a lower computational cost.
Dataset Splits	Yes	We search for k on CIFAR100 and use the optimal for experiments on CIFAR10 and Image Net. We perform a grid search on CIFAR100 and VGG-16 over k {2, 3, 4, 5, 6, 7} using 20% of the training set for validation.
Hardware Specification	Yes	Training is done on a single NVIDIA Tesla V100 GPU. ... on 8 NVIDIA Tesla V100 GPUs.
Software Dependencies	No	The paper mentions 'Py Torch defaults' for optimizer settings but does not specify version numbers for PyTorch or any other software libraries or dependencies used in the experiments.
Experiment Setup	Yes	We train VGG-16 on CIFAR-{10,100} for 600 epochs with a batch size of 128, cosine annealing scheduler with an initial learning rate of 0.1, and SGD optimizer with momentum 0.9 and weight decay 1e-4. We update our spatial gradient scalings every 30 epochs using 20 random batches from the training set. We add a 1 epoch warm-up period at the start of training before generating our gradient scalings.