Local Regularizer Improves Generalization

Authors: Yikai Zhang, Hui Qu, Dimitris Metaxas, Chao Chen6861-6868

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our theoretical results are supported by experiments. We observe consistently better generalization performance of LRSGD-R and LRSGD-C over SGD on different neural net architectures. and 5 Experiments We empirically show the generalization power of LRSGDR and LRSGD-C. We show that they generalize better than SGD for different network architectures.
Researcher Affiliation Academia Yikai Zhang,1 Hui Qu,1* Dimitris Metaxas,1 Chao Chen2 1Department of Computer Science, Rutgers University 2Departments of Biomedical Informatics, Stony Brook University {yz422, hui.qu, dnm}@cs.rutgers.edu, chao.chen.cchen@gmail.com
Pseudocode Yes Algorithm 1 SGD and Algorithm 2 LRSGD
Open Source Code Yes The code is available at https://github.com/huiqu18/LRSGD.
Open Datasets Yes All experiments are based on the CIFAR10 dataset which consists of 10 classes of 32 32 color images, with 6k images per class (Krizhevsky and Hinton 2009).
Dataset Splits No They are split into train and test sets with 50k and 10k images, respectively. The paper does not explicitly state a validation split.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models or cloud instance types used for the experiments.
Software Dependencies No The paper does not provide specific software dependency versions (e.g., Python, PyTorch, TensorFlow versions or other library versions).
Experiment Setup Yes The momentum and weight decay parameters of SGD are set to be 0.9 and 0.0001. The number of iteration is 13.7e4 (350 epochs), the batch size is 128, and the learning rate is α = 0.1 initially and decayed by 10 in iteration 5.8e4 and 9.8e4 (epoch 150 and 250). For LRSGD-R, we set γ = 0.1, λt = 0.01/α. For LRSGDC, Mt = 10 and λt = 0.01/α.