Spectral Representations for Convolutional Neural Networks

Authors: Oren Rippel, Jasper Snoek, Ryan P. Adams

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of this reparametrization on a number of CNN optimization tasks, converging 2-5 times faster than the standard spatial representation. We test spectral pooling on different classification tasks. We run all experiments on code optimized for the Xeon Phi coprocessor.
Researcher Affiliation Collaboration Oren Rippel Department of Mathematics Massachusetts Institute of Technology rippel@math.mit.edu Jasper Snoek Twitter and Harvard SEAS jsnoek@seas.harvard.edu Ryan P. Adams Twitter and Harvard SEAS rpa@seas.harvard.edu
Pseudocode Yes Algorithm 1: Spectral pooling Input: Map x RM N, output size H W Output: Pooled map ˆx RH W 1: y F(x) 2: ˆy CROPSPECTRUM(y, H W) 3: ˆy TREATCORNERCASES(ˆy) 4: ˆx F 1(ˆy) Algorithm 2: Spectral pooling back-propagation Input: Gradient w.r.t output Rˆx Output: Gradient w.r.t input R x 1: ˆz F Rˆx 2: ˆz REMOVEREDUNDANCY(ˆz) 3: z PADSPECTRUM(ˆz, M N) 4: z RECOVERMAP(z) 5: R
Open Source Code No The paper does not provide any explicit statements or links to open-source code for the described methodology.
Open Datasets Yes We test the information retainment properties of spectral pooling on the validation set of Image Net (Russakovsky et al., 2015). We test spectral pooling on different classification tasks...These settings allow us to attain classification rates of 8.6% on CIFAR-10 and 31.6% on CIFAR-100.
Dataset Splits No The paper mentions using the
Hardware Specification Yes We ran all experiments on code optimized for the Xeon Phi coprocessor.
Software Dependencies No The paper mentions using Spearmint (Snoek et al., 2015) for Bayesian optimization and Adam (Kingma & Ba, 2015) as an optimizer, but it does not specify version numbers for these or any other software components.
Experiment Setup Yes We hyperparametrize and optimize the following CNN architecture: C96+32m 3 3 SP γHm γHm M m=1 C96+32M 1 1 C10/100 1 1 GA Softmax (5). We perform hyperparameter optimization on the dimensionality decay rate γ [0.25, 0.85], number of layers M {1, . . . , 15}, resolution randomization hyperparameters α, β [0, 0.8], weight decay rate in [10 5, 10 2], momentum in [1 0.10.5, 1 0.12] and initial learning rate in [0.14, 0.1]. We train each model for 150 epochs and anneal the learning rate by a factor of 10 at epochs 100 and 140.