The Singular Values of Convolutional Layers
Authors: Hanie Sedghi, Vineet Gupta, Philip M. Long
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We characterize the singular values of the linear transformation associated with a standard 2D multi-channel convolutional layer, enabling their efficient computation. This characterization also leads to an algorithm for projecting a convolutional layer onto an operator-norm ball. We show that this is an effective regularizer; for example, it improves the test error of a deep residual network using batch normalization on CIFAR-10 from 6.2% to 5.3%. Timing tests, reported in Section 4.1, confirm that this characterization speeds up the computation of singular values by multiple orders of magnitude making it usable in practice. |
| Researcher Affiliation | Industry | Hanie Sedghi, Vineet Gupta and Philip M. Long Google Brain Mountain View, CA 94043 |
| Pseudocode | Yes | def SingularValues(kernel, input_shape): transforms = np.fft.fft2(kernel, input_shape, axes=[0, 1]) return np.linalg.svd(transforms, compute_uv=False) in the introduction, and a full Python function def Clip_Operator_Norm(...) in Appendix A. |
| Open Source Code | No | The paper provides code snippets and mentions having their own TensorFlow and NumPy implementations, but does not include an explicit statement about releasing the code for the described methodology or a direct link to a code repository. |
| Open Datasets | Yes | it improves the test error of a deep residual network using batch normalization on CIFAR-10 from 6.2% to 5.3%. |
| Dataset Splits | No | The paper uses CIFAR-10 and discusses training parameters and learning rate schedules but does not explicitly provide training/validation/test dataset splits or mention a specific validation set size/ratio. |
| Hardware Specification | No | The paper mentions that the TensorFlow implementation runs 'much faster on a GPU' and that 'clipping norms by our method on a GPU was about 25% faster', but it does not specify the model or type of GPU used. |
| Software Dependencies | No | The paper mentions the use of 'Num Py' and 'Tensor Flow' implementations but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | This network reached a test error rate of 6.2% after 250 epochs, using a learning rate schedule determined by a grid search... We then evaluated an algorithm that, every 100 steps, clipped the norms of the convolutional layers to various different values between 0.1 and 3.0. We tried all combinations of the following hyperparameters: (a) the norm of the ball projected onto (no projection, 0.5, 1.0, 1.5, 2.0); (b) the initial learning rate (0.001, 0.003, 0.01, 0.03, 0.1); (c) the minibatch size (32, 64); (d) the number of epochs per decay of the learning rate (1,2,3). |