reproducibilityindex.ai

Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks

Authors: Feng Chen, Daniel Kunin, Atsushi Yamamura, Surya Ganguli

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We observe empirically the existence of attractive invariant sets in trained deep neural networks, implying that SGD dynamics often collapses to simple subnetworks with either vanishing or redundant neurons. We further demonstrate how this simplifying process of stochastic collapse benefits generalization in a linear teacher-student framework.
Researcher Affiliation	Academia	Feng Chen Daniel Kunin Atsushi Yamamura () Surya Ganguli Stanford University {fengc,kunin,atsushi3,sganguli}@stanford.edu
Pseudocode	No	The paper does not contain any structured pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	The codes to reproduce the experiments in the main paper can be found at https://github. com/ccffccffcc/stochastic_collapse.
Open Datasets	Yes	We carried out all the deep learning experiments with VGG-16 [47] and Res Net-18 [48], training on the CIFAR-10 and CIFAR-100 datasets respectively [49].
Dataset Splits	No	The paper mentions training steps and evaluation but does not specify explicit train/validation/test dataset splits with percentages, sample counts, or references to predefined splits.
Hardware Specification	Yes	Our the experiments were run on the Google Cloud Platform (with 4 or 8 NVIDIA A100 (40GB) GPU). The initial code development occurred on a local cluster equipped with 10 NVIDIA TITAN X GPUs.
Software Dependencies	No	The paper mentions using SGD as an optimizer but does not provide specific version numbers for any software dependencies, libraries, or frameworks (e.g., Python, PyTorch, CUDA).
Experiment Setup	Yes	For all our training, we applied standard data augmentation and used SGD (with momentum = 0.9 and weight decay of 0.0005) as the optimizer. We trained VGG-16 for 105 steps on CIFAR-10 with a learning rate of 0.1 and Res Net-18 for 106 steps on CIFAR-100 with a learning rate of 0.02.