Analyzing Generalization of Neural Networks through Loss Path Kernels

Authors: Yilan Chen, Wei Huang, Hao Wang, Charlotte Loh, Akash Srivastava, Lam Nguyen, Lily Weng

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments demonstrate that our bound maintains a strong correlation with the true generalization error of NNs trained with gradient descent (GD) (see Figure 1 and 2 in Sec. 6 & 7).
Researcher Affiliation Collaboration Yilan Chen UCSD CSE yilan@ucsd.edu Wei Huang RIEKN AIP wei.huang.vr@riken.jp Hao Wang MIT-IBM Watson AI Lab hao@ibm.com Charlotte Loh MIT EECS cloh@mit.edu Akash Srivastava MIT-IBM Watson AI Lab akash.srivastava@ibm.com Lam M. Nguyen IBM Research Lam Nguyen.MLTD@ibm.com Tsui-Wei Weng UCSD HDSI lweng@ucsd.edu
Pseudocode No The paper describes methodologies and processes but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper mentions using and citing third-party open-source tools like 'torchdiffeq [12]. URL https://github.com/rtqichen/torchdiffeq.' and 'Neural Tangents [45]'. However, it does not provide an unambiguous statement or a link to the open-source code for the specific methodology or contributions described in this paper.
Open Datasets Yes We use a logistic loss to train a two-layer NN with 100 hidden nodes for binary classification on MNIST 1 and 7 [31] by full-batch gradient flow and compute its generalization bound. [...] We analyze the correlation between Gene(w, S) and the true generalization error by randomly sampling 100 NN architectures from NAS-Bench-201 [19].
Dataset Splits No The paper mentions using 'n = 1000 training samples' for certain calculations and estimating bounds with '20 training sets S', but it does not specify explicit train/validation/test dataset splits (e.g., percentages or exact counts for each split) needed for reproduction.
Hardware Specification Yes Experiments are implemented with PyTorch [46] on 24G A5000 and 32G V100 GPUs.
Software Dependencies No The paper mentions using 'PyTorch [46]' and 'torchdiffeq [12]' but does not provide specific version numbers for these or any other software dependencies needed for replication.
Experiment Setup Yes We use a logistic loss to train a two-layer NN with 100 hidden nodes for binary classification on MNIST 1 and 7 [31] by full-batch gradient flow... The NN is initialized using the NTK parameterization [28]. We use the Softplus activation function, defined as Softplus(x) = 1 β ln(1 + eβx). This function is continuously differentiable and serves as a smooth approximation to the Re LU activation function. In our experiments, we set β = 10. [...] We train the NN using GD with a finite learning rate η = 10... Usgd is estimated with a batch of data of size 600.