Optimization and Bayes: A Trade-off for Overparameterized Neural Networks
Authors: Zhengmian Hu, Heng Huang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first illustrate that Hessian trace doesn’t vanish for overparameterized network and our analysis induces an efficient estimation of this value. Next, we verify our theoretical finding by comparing the dynamics of an overparameterized network in function space and parameter space. Finally, we demonstrate the interpolation of sampling and optimization. |
| Researcher Affiliation | Academia | Zhengmian Hu, Heng Huang Department of Computer Science University of Maryland College Park, MD 20740 |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link to open-source code for the methodology described. |
| Open Datasets | Yes | We consider one-shot learning on Fashion-MNIST [75]. |
| Dataset Splits | No | The paper does not specify general training, validation, and test dataset splits for reproducibility. It only mentions a specific 'one-shot learning' setup where 'one sample for each class as training dataset' is selected, without defining the overall dataset partitioning. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or specific computing platforms) used for conducting the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software libraries, frameworks, or dependencies used in the experiments. |
| Experiment Setup | Yes | In Section 8.3, for experiments on Fashion-MNIST, the paper states: 'We use a single hidden layer network with width being 1024 and softplus activation. We use loss l(y, t) = 1/(1 + exp(yt)) and surrogate loss ls(y, t) = log(1 + exp( yt)) for gradient descent. For Gibbs measure, we fix λ = 180. The entropy change is approximately evaluated by integrating Eq. (9) with finite step size and fixed Θ(d). We train 10^5 independent network'. In Section 8.2, it mentions: 'For dynamics in parameter space, we run SGD with finite step size 0.01 and mini-batch size 1.' |