Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On the Importance of Gaussianizing Representations

Authors: Daniel Eftekhari, Vardan Papyan

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments comprehensively demonstrate the effectiveness of normality normalization, in regards to its generalization performance on an array of widely used model and dataset combinations, its strong performance across various common factors of variation such as model width, depth, and training minibatch size, its suitability for usage wherever existing normalization layers are conventionally used, and as a means to improving model robustness to random perturbations.
Researcher Affiliation Academia 1Department of Computer Science, University of Toronto, Toronto, Canada 2Vector Institute, Toronto, Canada 3Department of Mathematics, University of Toronto, Toronto, Canada. Correspondence to: Daniel Eftekhari <EMAIL>.
Pseudocode Yes Algorithm 1 provides a summary of normality normalization.
Open Source Code Yes Code is made available at https://github.com/Daniel Eftekhari/normality-normalization.
Open Datasets Yes The datasets we used were CIFAR10, CIFAR100 (Krizhevsky, 2009), STL10 (Coates et al., 2011), SVHN (Netzer et al., 2011), Caltech101 (Li et al., 2022), Tiny Image Net (Le & Yang, 2015), Food101 (Bossard et al., 2014), and Image Net (Deng et al., 2009).
Dataset Splits Yes For the Caltech101 dataset, each run used a random 90/10% allocation to obtain the training and validation splits respectively. ... For the experiments involving the SVHN dataset, models were trained from random initialization for 200 epochs, with a factor of 10 reduction in learning rate at each 60-epoch interval, and a minibatch size of 32.
Hardware Specification Yes Values are obtained using an NVIDIA V100 GPU.
Software Dependencies No The paper mentions PyTorch and AdamW optimizer but does not provide specific version numbers for these software components. For example, it states: "We trained our models using the Py Torch (Paszke et al., 2019) machine learning framework."
Experiment Setup Yes In all of our experiments involving the Res Net18, Res Net34, and Wide Res Net architectures, stochastic gradient descent (SGD) with learning rate 0.1, weight decay 5 10 4, momentum 0.9, and minibatch size 128 was used. ... The Adam W optimizer (Kingma & Ba, 2015; Loshchilov & Hutter, 2019) with learning rate 1 10 3, weight decay 5 10 2, (Ξ²1, Ξ²2) = (0.9, 0.999), Ο΅ = 1 10 8 was used. A noise factor of ΞΎ = 1.0 was used, as preliminary experiments demonstrated increases typically resulted in training instability.