Analyzing Generalization of Neural Networks through Loss Path Kernels
Authors: Yilan Chen, Wei Huang, Hao Wang, Charlotte Loh, Akash Srivastava, Lam Nguyen, Lily Weng
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments demonstrate that our bound maintains a strong correlation with the true generalization error of NNs trained with gradient descent (GD) (see Figure 1 and 2 in Sec. 6 & 7). |
| Researcher Affiliation | Collaboration | Yilan Chen UCSD CSE yilan@ucsd.edu Wei Huang RIEKN AIP wei.huang.vr@riken.jp Hao Wang MIT-IBM Watson AI Lab hao@ibm.com Charlotte Loh MIT EECS cloh@mit.edu Akash Srivastava MIT-IBM Watson AI Lab akash.srivastava@ibm.com Lam M. Nguyen IBM Research Lam Nguyen.MLTD@ibm.com Tsui-Wei Weng UCSD HDSI lweng@ucsd.edu |
| Pseudocode | No | The paper describes methodologies and processes but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using and citing third-party open-source tools like 'torchdiffeq [12]. URL https://github.com/rtqichen/torchdiffeq.' and 'Neural Tangents [45]'. However, it does not provide an unambiguous statement or a link to the open-source code for the specific methodology or contributions described in this paper. |
| Open Datasets | Yes | We use a logistic loss to train a two-layer NN with 100 hidden nodes for binary classification on MNIST 1 and 7 [31] by full-batch gradient flow and compute its generalization bound. [...] We analyze the correlation between Gene(w, S) and the true generalization error by randomly sampling 100 NN architectures from NAS-Bench-201 [19]. |
| Dataset Splits | No | The paper mentions using 'n = 1000 training samples' for certain calculations and estimating bounds with '20 training sets S', but it does not specify explicit train/validation/test dataset splits (e.g., percentages or exact counts for each split) needed for reproduction. |
| Hardware Specification | Yes | Experiments are implemented with PyTorch [46] on 24G A5000 and 32G V100 GPUs. |
| Software Dependencies | No | The paper mentions using 'PyTorch [46]' and 'torchdiffeq [12]' but does not provide specific version numbers for these or any other software dependencies needed for replication. |
| Experiment Setup | Yes | We use a logistic loss to train a two-layer NN with 100 hidden nodes for binary classification on MNIST 1 and 7 [31] by full-batch gradient flow... The NN is initialized using the NTK parameterization [28]. We use the Softplus activation function, defined as Softplus(x) = 1 β ln(1 + eβx). This function is continuously differentiable and serves as a smooth approximation to the Re LU activation function. In our experiments, we set β = 10. [...] We train the NN using GD with a finite learning rate η = 10... Usgd is estimated with a batch of data of size 600. |