Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Optimization and Bayes: A Trade-off for Overparameterized Neural Networks
Authors: Zhengmian Hu, Heng Huang
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first illustrate that Hessian trace doesnโt vanish for overparameterized network and our analysis induces an efficient estimation of this value. Next, we verify our theoretical finding by comparing the dynamics of an overparameterized network in function space and parameter space. Finally, we demonstrate the interpolation of sampling and optimization. |
| Researcher Affiliation | Academia | Zhengmian Hu, Heng Huang Department of Computer Science University of Maryland College Park, MD 20740 |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link to open-source code for the methodology described. |
| Open Datasets | Yes | We consider one-shot learning on Fashion-MNIST [75]. |
| Dataset Splits | No | The paper does not specify general training, validation, and test dataset splits for reproducibility. It only mentions a specific 'one-shot learning' setup where 'one sample for each class as training dataset' is selected, without defining the overall dataset partitioning. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or specific computing platforms) used for conducting the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software libraries, frameworks, or dependencies used in the experiments. |
| Experiment Setup | Yes | In Section 8.3, for experiments on Fashion-MNIST, the paper states: 'We use a single hidden layer network with width being 1024 and softplus activation. We use loss l(y, t) = 1/(1 + exp(yt)) and surrogate loss ls(y, t) = log(1 + exp( yt)) for gradient descent. For Gibbs measure, we fix ฮป = 180. The entropy change is approximately evaluated by integrating Eq. (9) with finite step size and fixed ฮ(d). We train 10^5 independent network'. In Section 8.2, it mentions: 'For dynamics in parameter space, we run SGD with finite step size 0.01 and mini-batch size 1.' |