reproducibilityindex.ai

Optimization and Bayes: A Trade-off for Overparameterized Neural Networks

Authors: Zhengmian Hu, Heng Huang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We first illustrate that Hessian trace doesn’t vanish for overparameterized network and our analysis induces an efficient estimation of this value. Next, we verify our theoretical finding by comparing the dynamics of an overparameterized network in function space and parameter space. Finally, we demonstrate the interpolation of sampling and optimization.
Researcher Affiliation	Academia	Zhengmian Hu, Heng Huang Department of Computer Science University of Maryland College Park, MD 20740
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement or link to open-source code for the methodology described.
Open Datasets	Yes	We consider one-shot learning on Fashion-MNIST [75].
Dataset Splits	No	The paper does not specify general training, validation, and test dataset splits for reproducibility. It only mentions a specific 'one-shot learning' setup where 'one sample for each class as training dataset' is selected, without defining the overall dataset partitioning.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or specific computing platforms) used for conducting the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software libraries, frameworks, or dependencies used in the experiments.
Experiment Setup	Yes	In Section 8.3, for experiments on Fashion-MNIST, the paper states: 'We use a single hidden layer network with width being 1024 and softplus activation. We use loss l(y, t) = 1/(1 + exp(yt)) and surrogate loss ls(y, t) = log(1 + exp( yt)) for gradient descent. For Gibbs measure, we fix λ = 180. The entropy change is approximately evaluated by integrating Eq. (9) with finite step size and fixed Θ(d). We train 10^5 independent network'. In Section 8.2, it mentions: 'For dynamics in parameter space, we run SGD with finite step size 0.01 and mini-batch size 1.'