reproducibilityindex.ai

Separation and Concentration in Deep Networks

Authors: John Zarka, Florentin Guth, Stéphane Mallat

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments demonstrate that deep neural network classiﬁers progressively separate class distributions around their mean, achieving linear separability on the training set, and increasing the Fisher discriminant ratio. We explain this mechanism with two types of operators. We prove that a rectiﬁer without biases applied to sign-invariant tight frames can separate class means and increase Fisher ratios. On the opposite, a soft-thresholding on tight frames can reduce withinclass variabilities while preserving class means. Variance reduction bounds are proved for Gaussian mixture models. For image classiﬁcation, we show that separation of class means can be achieved with rectiﬁed wavelet tight frames that are not learned. It deﬁnes a scattering transform. Learning 1 1 convolutional tight frames along scattering channels and applying a soft-thresholding reduces within-class variabilities. The resulting scattering network reaches the classiﬁcation accuracy of Res Net-18 on CIFAR-10 and Image Net, with fewer layers and no learned biases.
Researcher Affiliation	Academia	John Zarka, Florentin Guth Département d informatique de l ENS, ENS, CNRS, PSL University, Paris, France {john.zarka,florentin.guth}@ens.fr Stéphane Mallat Collège de France, Paris, France Flatiron Institute, New York, USA
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code to reproduce all experiments of the paper is available at https://github.com/j-zarka/separation_concentration_deepnets.
Open Datasets	Yes	Image classiﬁcation is ﬁrst evaluated on the MNIST (Le Cun et al., 2010) and CIFAR-10 (Krizhevsky, 2009) image datasets.
Dataset Splits	No	The paper mentions using 'training data' but does not specify the exact proportions or counts for training, validation, or test splits. It refers to standard datasets like MNIST, CIFAR-10, and Image Net, which often have predefined splits, but these are not explicitly stated within the paper.
Hardware Specification	No	The paper mentions 'We would like to thank the Scientiﬁc Computing Core at the Flatiron Institute for the use of their computing resources,' but does not provide specific details about the hardware used (e.g., CPU, GPU models, memory).
Software Dependencies	No	The paper refers to 'Kymatio: Scattering transforms in python' in the acknowledgments/references, implying its use, but it does not specify version numbers for Kymatio or any other key software dependencies like Python, PyTorch, or TensorFlow.
Experiment Setup	Yes	The parameters W, F and b are optimized with a stochastic gradient descent that minimizes a logistic cross-entropy loss on the output. To impose F T F = Id, following the optimization of Parseval networks (Cisse et al., 2017), after each gradient update of all network parameters, we insert a second gradient step to minimize α/2 F T F Id 2. We also make sure after every Parseval step that each tight frame row fm keeps a constant norm fm = p d/p by applying a spherical projection: fm p d/p fm/ fm. The tight frame F is a convolution on patches of size k k with a stride of k/2, with k = 14 for MNIST and k = 8 for CIFAR. A soft-thresholding ρt(u) = sign(u) max(\|u\| λ, 0) shrinks the amplitude of u by λ... A nearly optimal threshold is λ = 1.5 σ. We rescale the frame variance σ2 by standardizing the input x so that it has a zero mean and each coefﬁcient has a unit variance. It results that we choose λ = 1.5 p d/p. The number of scales J depends upon the image size. It is J = 3 for MNIST and CIFAR, and J = 4 for Image Net, resulting in respectively K = 217, 651 and 1251 channels.