Learning Across Scales—Multiscale Methods for Convolution Neural Networks
Authors: Eldad Haber, Lars Ruthotto, Elliot Holtham, Seong-Hwan Jun
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Sec. 4 we demonstrate the potential of our methods using image classification benchmarks. The validation accuracy on their native resolution is around 98.28% and 98.18% for the coarse and fine scale network, respectively. |
| Researcher Affiliation | Collaboration | 1 Dept. of Earth and Ocean Science, University of British Columbia, Vancouver, Canada eldadhaber@gmail.com 2 Xtract Technologies, Vancouver, BC, Canada, elliot@xtract.tech 3 Dept. of Mathematics and Computer Science, Emory University, Atlanta, GA, USA, lruthotto@emory.edu 4 Dept. of Statistics, University of British Columbia, Vancouver, Canada, seong.jun@stat.ubc.ca |
| Pseudocode | Yes | Algorithm 1 Multigrid Prolongation |
| Open Source Code | No | The paper does not include any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We consider the MNIST dataset and We select ten categories from the Image Net dataset (Russakovsky et al. 2015). |
| Dataset Splits | Yes | We randomly divide the datasets into a training set consisting of 50, 000 images, and a validation set consisting of 10, 000 images. |
| Hardware Specification | No | The paper discusses 'resource-limited systems' and the computational efficiency of the approach but does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions methods like Block-Coordinate-Descent (BCD) and architectures like Res Net-34 but does not provide specific version numbers for any software libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | In all experiments, we choose a CNN with identical layers, tanh activation function, and a softmax classifier. For optimization, we use the following Block-Coordinate-Descent (BCD) method: Each iteration consists of one Gauss-Newton step with subsampled Hessian to update the forward propagation parameters and five inexact Newton steps to update the weights and biases of the classifier. To avoid overfitting and stabilize the process, we enforce spatial smoothness of the classification weights and smoothness across layers for the propagation parameters through derivative-based regularization as also suggested by Haber and Ruthotto (2017). For each CNN, we estimate the parameters using 20 iterations of the BCD. |