Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning
Authors: Yang Zhao, Hao Zhang, Xiuyuan Hu
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we confirm that when using our methods, generalization performance of various models could be improved on different datasets. |
| Researcher Affiliation | Academia | 1Department of Electronic Engineering, Tsinghua University. |
| Pseudocode | Yes | Algorithm 1 Optimization Scheme of Penalizing Gradient Norm |
| Open Source Code | Yes | Code is available at https://github.com/zhaoyang-0204/gnp. |
| Open Datasets | Yes | In our experiments, we apply extensive model architectures on Cifar-{10, 100} datasets and Image Net datasets, respectively. |
| Dataset Splits | No | The paper mentions grid searches for hyperparameters but does not explicitly describe a dedicated validation dataset split for reproducibility. It discusses training with random seeds and reporting performance on testing sets. |
| Hardware Specification | Yes | all the experiments are deployed using the JAX framework on the NVIDIA DGX Station A100. |
| Software Dependencies | No | The paper mentions "JAX framework" but does not specify a version number for JAX or any other software dependencies with their respective versions. |
| Experiment Setup | Yes | We would adopt a greedy strategy to reduce tuning cost during implementation, ... We would next perform a grid search on the scaler r over the set {0.01, 0.02, 0.05, 0.1, 0.2}. ... After determining the best value of r, we would moreover perform a grid search on the balance coefficient α in the range 0.1 to 0.9 at an interval of 0.1. |