The Singular Values of Convolutional Layers

Authors: Hanie Sedghi, Vineet Gupta, Philip M. Long

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We characterize the singular values of the linear transformation associated with a standard 2D multi-channel convolutional layer, enabling their efficient computation. This characterization also leads to an algorithm for projecting a convolutional layer onto an operator-norm ball. We show that this is an effective regularizer; for example, it improves the test error of a deep residual network using batch normalization on CIFAR-10 from 6.2% to 5.3%. Timing tests, reported in Section 4.1, confirm that this characterization speeds up the computation of singular values by multiple orders of magnitude making it usable in practice.
Researcher Affiliation Industry Hanie Sedghi, Vineet Gupta and Philip M. Long Google Brain Mountain View, CA 94043
Pseudocode Yes def SingularValues(kernel, input_shape): transforms = np.fft.fft2(kernel, input_shape, axes=[0, 1]) return np.linalg.svd(transforms, compute_uv=False) in the introduction, and a full Python function def Clip_Operator_Norm(...) in Appendix A.
Open Source Code No The paper provides code snippets and mentions having their own TensorFlow and NumPy implementations, but does not include an explicit statement about releasing the code for the described methodology or a direct link to a code repository.
Open Datasets Yes it improves the test error of a deep residual network using batch normalization on CIFAR-10 from 6.2% to 5.3%.
Dataset Splits No The paper uses CIFAR-10 and discusses training parameters and learning rate schedules but does not explicitly provide training/validation/test dataset splits or mention a specific validation set size/ratio.
Hardware Specification No The paper mentions that the TensorFlow implementation runs 'much faster on a GPU' and that 'clipping norms by our method on a GPU was about 25% faster', but it does not specify the model or type of GPU used.
Software Dependencies No The paper mentions the use of 'Num Py' and 'Tensor Flow' implementations but does not provide specific version numbers for these software components.
Experiment Setup Yes This network reached a test error rate of 6.2% after 250 epochs, using a learning rate schedule determined by a grid search... We then evaluated an algorithm that, every 100 steps, clipped the norms of the convolutional layers to various different values between 0.1 and 3.0. We tried all combinations of the following hyperparameters: (a) the norm of the ball projected onto (no projection, 0.5, 1.0, 1.5, 2.0); (b) the initial learning rate (0.001, 0.003, 0.01, 0.03, 0.1); (c) the minibatch size (32, 64); (d) the number of epochs per decay of the learning rate (1,2,3).