Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Understanding Why Neural Networks Generalize Well Through GSNR of Parameters
Authors: Jinlong Liu, Yunzhi Bai, Guoqing Jiang, Ting Chen, Huayan Wang
ICLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the remainder of this paper we ο¬rst analyze the relation between GSNR and generalization (Section 2). We then show how the training dynamics lead to large GSNR of model parameters experimentally and analytically in Section 3. We performed the above estimations on MNIST with a simple CNN structure consists of 2 Conv-Relu-Max Pooling blocks and 2 fully-connected layers. First, to estimate eq. (24) with M = 10, we randomly sample 10 training sets with size n and a test set with size 10,000. |
| Researcher Affiliation | Industry | Jinlong Liu1 , Guo-qing Jiang1, Yunzhi Bai1, Ting Chen2, and Huayan Wang1 1Ytech KWAI incorporation EMAIL 2Samsung Research China Beijing (SRC-B) EMAIL |
| Pseudocode | No | The paper includes mathematical equations and derivations, but it does not provide any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not contain any statements or links indicating that open-source code for the methodology described is available. |
| Open Datasets | Yes | We performed the above estimations on MNIST with a simple CNN structure... We also conducted the same experiment on CIFAR10 A.2 and a toy dataset A.3 observed the same behavior. |
| Dataset Splits | No | The paper mentions 'training set' and 'test set' and their sizes (e.g., 'training set D = {(x1, y1), ..., (xn, yn)} Zn' and 'test set D = {(x 1, y 1), ..., (x n , y n )} Zn' in Section 2.2; 'The training set and test set sizes are 200 and 10,000, respectively' in Section 3.2), but it does not specify a validation set or a training/validation/test split. |
| Hardware Specification | No | The paper does not specify any details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions using 'gradient descent training' and different network structures (CNN, MLP), but it does not specify any particular software libraries, frameworks, or their version numbers (e.g., PyTorch, TensorFlow, scikit-learn, Python version). |
| Experiment Setup | Yes | To cover different conditions, we (1) choose n {1000, 2000, 4000, 6000, 8000, 10000, 15000}, respectively; (2) inject noise by randomly changing the labels with probability prandom {0.0, 0.1, 0.2, 0.3, 0.5}; (3) change the model structure by varying number of channels in the layers, ch {6, 8, 10, 12, 14, 16, 18, 20}. See Appendix A for more details of the setup. We use the gradient descent training (not SGD), with a small learning rate of 0.001. |