End-to-End Kernel Learning with Supervised Convolutional Kernel Networks

Authors: Julien Mairal

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that our method achieves reasonably competitive performance for image classification on some standard deep learning datasets such as CIFAR-10 and SVHN, and also for image super-resolution, demonstrating the applicability of our approach to a large variety of image-related tasks.
Researcher Affiliation Academia Julien Mairal Inria julien.mairal@inria.fr Thoth team, Inria Grenoble, Laboratoire Jean Kuntzmann, CNRS, Univ. Grenoble Alpes, France.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No To gain more scalability and start exploring these directions, we are currently working on a GPU implementation, which we plan to publicly release along with our current CPU implementation.
Open Datasets Yes We consider the datasets CIFAR-10 [12] and SVHN [19], which contain 32 32 images from 10 classes.
Dataset Splits Yes The regularization parameter λ and the number of epochs are set by first running the algorithm on a 80/20 validation split of the training set.
Hardware Specification Yes All experiments were conducted on 8-core and 10-core 2.4GHz Intel CPUs using C++ and Matlab.
Software Dependencies No The paper mentions "C++ and Matlab" but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes We evaluate the performance of a 9-layer network, designed with few hyper-parameters: for each layer, we learn 512 filters and choose the RBF kernels κj defined in (2) with initial parameters αj =1/(0.52). Layers 1, 3, 5, 7, 9 use 3 3 patches and a subsampling pooling factor of 2 except for layer 9 where the factor is 3; Layers 2, 4, 6, 8 use simply 1 1 patches and no subsampling. For CIFAR-10, the parameters αj are kept fixed during training, and for SVHN, they are updated in the same way as the filters. We use the squared hinge loss in a one vs all setting to perform multi-class classification (with shared filters Z between classes). The input of the network is pre-processed with the local whitening procedure described in [20]. We use the optimization heuristics from the previous section, notably the automatic learning rate scheme, and a gradient momentum with parameter 0.9, following [12]. The regularization parameter λ and the number of epochs are set by first running the algorithm on a 80/20 validation split of the training set. λ is chosen near the canonical parameter λ = 1/n, in the range 2i/n, with i = 4, . . . , 4, and the number of epochs is at most 100. The initial learning rate is 10 with a minibatch size of 128.