reproducibilityindex.ai

Understanding Why Neural Networks Generalize Well Through GSNR of Parameters

Authors: Jinlong Liu, Yunzhi Bai, Guoqing Jiang, Ting Chen, Huayan Wang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the remainder of this paper we ﬁrst analyze the relation between GSNR and generalization (Section 2). We then show how the training dynamics lead to large GSNR of model parameters experimentally and analytically in Section 3. We performed the above estimations on MNIST with a simple CNN structure consists of 2 Conv-Relu-Max Pooling blocks and 2 fully-connected layers. First, to estimate eq. (24) with M = 10, we randomly sample 10 training sets with size n and a test set with size 10,000.
Researcher Affiliation	Industry	Jinlong Liu1 , Guo-qing Jiang1, Yunzhi Bai1, Ting Chen2, and Huayan Wang1 1Ytech KWAI incorporation {liujinlong,jiangguoqing,baiyunzhi,wanghuayan}@kuaishou.com 2Samsung Research China Beijing (SRC-B) ting11.chen@samsung.com
Pseudocode	No	The paper includes mathematical equations and derivations, but it does not provide any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not contain any statements or links indicating that open-source code for the methodology described is available.
Open Datasets	Yes	We performed the above estimations on MNIST with a simple CNN structure... We also conducted the same experiment on CIFAR10 A.2 and a toy dataset A.3 observed the same behavior.
Dataset Splits	No	The paper mentions 'training set' and 'test set' and their sizes (e.g., 'training set D = {(x1, y1), ..., (xn, yn)} Zn' and 'test set D = {(x 1, y 1), ..., (x n , y n )} Zn' in Section 2.2; 'The training set and test set sizes are 200 and 10,000, respectively' in Section 3.2), but it does not specify a validation set or a training/validation/test split.
Hardware Specification	No	The paper does not specify any details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions using 'gradient descent training' and different network structures (CNN, MLP), but it does not specify any particular software libraries, frameworks, or their version numbers (e.g., PyTorch, TensorFlow, scikit-learn, Python version).
Experiment Setup	Yes	To cover different conditions, we (1) choose n {1000, 2000, 4000, 6000, 8000, 10000, 15000}, respectively; (2) inject noise by randomly changing the labels with probability prandom {0.0, 0.1, 0.2, 0.3, 0.5}; (3) change the model structure by varying number of channels in the layers, ch {6, 8, 10, 12, 14, 16, 18, 20}. See Appendix A for more details of the setup. We use the gradient descent training (not SGD), with a small learning rate of 0.001.