Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning

Authors: Yang Zhao, Hao Zhang, Xiuyuan Hu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we confirm that when using our methods, generalization performance of various models could be improved on different datasets.
Researcher Affiliation Academia 1Department of Electronic Engineering, Tsinghua University.
Pseudocode Yes Algorithm 1 Optimization Scheme of Penalizing Gradient Norm
Open Source Code Yes Code is available at https://github.com/zhaoyang-0204/gnp.
Open Datasets Yes In our experiments, we apply extensive model architectures on Cifar-{10, 100} datasets and Image Net datasets, respectively.
Dataset Splits No The paper mentions grid searches for hyperparameters but does not explicitly describe a dedicated validation dataset split for reproducibility. It discusses training with random seeds and reporting performance on testing sets.
Hardware Specification Yes all the experiments are deployed using the JAX framework on the NVIDIA DGX Station A100.
Software Dependencies No The paper mentions "JAX framework" but does not specify a version number for JAX or any other software dependencies with their respective versions.
Experiment Setup Yes We would adopt a greedy strategy to reduce tuning cost during implementation, ... We would next perform a grid search on the scaler r over the set {0.01, 0.02, 0.05, 0.1, 0.2}. ... After determining the best value of r, we would moreover perform a grid search on the balance coefficient α in the range 0.1 to 0.9 at an interval of 0.1.