End-to-End Kernel Learning with Supervised Convolutional Kernel Networks
Authors: Julien Mairal
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that our method achieves reasonably competitive performance for image classification on some standard deep learning datasets such as CIFAR-10 and SVHN, and also for image super-resolution, demonstrating the applicability of our approach to a large variety of image-related tasks. |
| Researcher Affiliation | Academia | Julien Mairal Inria julien.mairal@inria.fr Thoth team, Inria Grenoble, Laboratoire Jean Kuntzmann, CNRS, Univ. Grenoble Alpes, France. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | To gain more scalability and start exploring these directions, we are currently working on a GPU implementation, which we plan to publicly release along with our current CPU implementation. |
| Open Datasets | Yes | We consider the datasets CIFAR-10 [12] and SVHN [19], which contain 32 32 images from 10 classes. |
| Dataset Splits | Yes | The regularization parameter λ and the number of epochs are set by first running the algorithm on a 80/20 validation split of the training set. |
| Hardware Specification | Yes | All experiments were conducted on 8-core and 10-core 2.4GHz Intel CPUs using C++ and Matlab. |
| Software Dependencies | No | The paper mentions "C++ and Matlab" but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We evaluate the performance of a 9-layer network, designed with few hyper-parameters: for each layer, we learn 512 filters and choose the RBF kernels κj defined in (2) with initial parameters αj =1/(0.52). Layers 1, 3, 5, 7, 9 use 3 3 patches and a subsampling pooling factor of 2 except for layer 9 where the factor is 3; Layers 2, 4, 6, 8 use simply 1 1 patches and no subsampling. For CIFAR-10, the parameters αj are kept fixed during training, and for SVHN, they are updated in the same way as the filters. We use the squared hinge loss in a one vs all setting to perform multi-class classification (with shared filters Z between classes). The input of the network is pre-processed with the local whitening procedure described in [20]. We use the optimization heuristics from the previous section, notably the automatic learning rate scheme, and a gradient momentum with parameter 0.9, following [12]. The regularization parameter λ and the number of epochs are set by first running the algorithm on a 80/20 validation split of the training set. λ is chosen near the canonical parameter λ = 1/n, in the range 2i/n, with i = 4, . . . , 4, and the number of epochs is at most 100. The initial learning rate is 10 with a minibatch size of 128. |