reproducibilityindex.ai

Relative Flatness and Generalization

Authors: Henning Petzka, Michael Kamp, Linara Adilova, Cristian Sminchisescu, Mario Boley

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate the assumptions and consequences of the theoretical results derived above. For that, we first show on a synthetic example that the empirical correlation between flatness and generalization decreases if labels are not locally constant, up to a point when they are not correlated anymore. We then show that the novel relative flatness measure correlates strongly with generalization, also in the presence of reparameterizations. Finally, we show in a synthetic experiment that while representativeness cannot be computed without knowing the true data distribution, it can in practice be approximated.
Researcher Affiliation	Collaboration	Henning Petzka Lund University, Sweden henning.petzka@math.lth.se Michael Kamp CISPA Helmholtz Center for Information Security, Germany and Monash University, Australia michael.kamp@monash.edu Linara Adilova Ruhr University Bochum, Germany and Fraunhofer IAIS Cristian Sminchisescu Lund University, Sweden and Google Research, Switzerland Mario Boley Monash University, Australia
Pseudocode	No	The paper describes methods and derivations mathematically and in prose, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	2 Code is available at https://github.com/kampmichael/relativeFlatnessGeneralization.
Open Datasets	Yes	We compare this correlation to the classical Hessian-based ﬂatness measures using the trace of the loss-Hessian, the Fisher-Rao norm [21], the PACBayes ﬂatness measure that performed best in the extensive study of Jiang et al. [14] and the L2-norm of the weights. The results in Fig. 5 show that indeed relative ﬂatness has higher correlation than all the competing measures. Of these measures, only the Fisher-Rao norm is reparameterization invariant but shows the weakest correlation in the experiment. In Appdx C we show how reparameterizations of the network signiﬁcantly reduce the correlation for non-reparameterization invariant measures. (Tested on CIFAR10 [18])
Dataset Splits	No	The paper mentions splitting data into training and test sets ('multiple random splits of S into a training set Strain and a test set Stest') but does not provide explicit details on validation splits, specific percentages, or absolute sample counts for each split.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies	No	The paper mentions software like PyTorch [28] and scikit-learn [29] in its references but does not specify version numbers for these or any other ancillary software components used in the experiments.
Experiment Setup	No	The paper states that experiments were conducted using 'different learning setups, such as initialization, learning rate, batch size, and optimization algorithm', and that 'Details on the experiments are provided in Appdx. C.'. However, specific concrete hyperparameter values or detailed configurations are not explicitly listed in the main text.