Learning Across Scales—Multiscale Methods for Convolution Neural Networks

Authors: Eldad Haber, Lars Ruthotto, Elliot Holtham, Seong-Hwan Jun

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Sec. 4 we demonstrate the potential of our methods using image classification benchmarks. The validation accuracy on their native resolution is around 98.28% and 98.18% for the coarse and fine scale network, respectively.
Researcher Affiliation Collaboration 1 Dept. of Earth and Ocean Science, University of British Columbia, Vancouver, Canada eldadhaber@gmail.com 2 Xtract Technologies, Vancouver, BC, Canada, elliot@xtract.tech 3 Dept. of Mathematics and Computer Science, Emory University, Atlanta, GA, USA, lruthotto@emory.edu 4 Dept. of Statistics, University of British Columbia, Vancouver, Canada, seong.jun@stat.ubc.ca
Pseudocode Yes Algorithm 1 Multigrid Prolongation
Open Source Code No The paper does not include any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We consider the MNIST dataset and We select ten categories from the Image Net dataset (Russakovsky et al. 2015).
Dataset Splits Yes We randomly divide the datasets into a training set consisting of 50, 000 images, and a validation set consisting of 10, 000 images.
Hardware Specification No The paper discusses 'resource-limited systems' and the computational efficiency of the approach but does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions methods like Block-Coordinate-Descent (BCD) and architectures like Res Net-34 but does not provide specific version numbers for any software libraries, frameworks, or programming languages used.
Experiment Setup Yes In all experiments, we choose a CNN with identical layers, tanh activation function, and a softmax classifier. For optimization, we use the following Block-Coordinate-Descent (BCD) method: Each iteration consists of one Gauss-Newton step with subsampled Hessian to update the forward propagation parameters and five inexact Newton steps to update the weights and biases of the classifier. To avoid overfitting and stabilize the process, we enforce spatial smoothness of the classification weights and smoothness across layers for the propagation parameters through derivative-based regularization as also suggested by Haber and Ruthotto (2017). For each CNN, we estimate the parameters using 20 iterations of the BCD.