Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

GradAug: A New Regularization Method for Deep Neural Networks

Authors: Taojiannan Yang, Sijie Zhu, Chen Chen

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct a comprehensive set of experiments to evaluate the proposed regularization method. Using a simple random scale transformation, Grad Aug can improve the Image Net Top-1 accuracy of Res Net-50 from 76.32% to 78.79%, which is a new state-of-the-art accuracy. By leveraging a more powerful data augmentation technique Cut Mix [13], we can further push the accuracy to 79.67%.
Researcher Affiliation Academia Taojiannan Yang, Sijie Zhu, Chen Chen University of North Carolina at Charlotte EMAIL
Pseudocode Yes The Pytorch-style pseudo-code of Grad Aug is presented in Algorithm 1.
Open Source Code Yes Code is available at https: //github.com/taoyang1122/Grad Aug
Open Datasets Yes Image Net [27] dataset contains 1.2 million training images and 50,000 validation images in 1000 categories. We also evaluate Grad Aug on Cifar-100 dataset [29]. The dataset has 50,000 images for training and 10,000 images for testing in 100 categories.
Dataset Splits Yes Image Net [27] dataset contains 1.2 million training images and 50,000 validation images in 1000 categories.
Hardware Specification Yes The training cost is measured on an 8 × 1080Ti GPU server with a batch size of 512.
Software Dependencies No The paper mentions 'Pytorch-style pseudo-code' and 'MMDetection toolbox [33]' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes On Res Net-50, we train the model for 120 epochs with a batch size of 512. The initial learning rate is 0.2 with cosine decay schedule. We sample n = 3 sub-networks in each training iteration and the width lower bound is α = 0.9. For simplicity, we only use random scale transformation for sub-networks. That is the input images are randomly resized to one of {224 × 224, 192 × 192, 160 × 160, 128 × 128}.