JFB: Jacobian-Free Backpropagation for Implicit Networks

Authors: Samy Wu Fung, Howard Heaton, Qiuwei Li, Daniel Mckenzie, Stanley Osher, Wotao Yin6648-6656

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show implicit networks trained with JFB are competitive with feedforward networks and prior implicit networks given the same number of parameters. This section shows the effectiveness of JFB using Py Torch (Paszke et al. 2017). All networks are Res Net-based such that Assumption 0.3 holds. Table 1: Test accuracy of JFB-trained Implicit Res Net compared to Neural ODEs, Augmented NODEs, and MONs
Researcher Affiliation Collaboration Samy Wu Fung,*1 Howard Heaton,*2 Qiuwei Li,3 Daniel Mc Kenzie,4 Stanley Osher,4 Wotao Yin3 1 Department of Applied Mathematics and Statistics, Colorado School of Mines 2 Typal Research, Typal LLC 3 Alibaba Group (US), Damo Academy 4 Department of Mathematics, University of California, Los Angeles
Pseudocode Yes Algorithm 1: Implicit Network with Fixed Point Iteration
Open Source Code Yes All codes can be found on Github: github.com/howardheaton/jacobian free backprop
Open Datasets Yes We train implicit networks on three benchmark image classification datasets licensed under CC-BY-SA: SVHN (Netzer et al. 2011), MNIST (Le Cun, Cortes, and Burges 2010), and CIFAR-10 (Krizhevsky and Hinton 2009).
Dataset Splits No The paper mentions using benchmark datasets like SVHN, MNIST, and CIFAR-10, which have well-defined standard splits, but it does not explicitly state the training, validation, and test split percentages or sample counts within the main text for reproducibility. It defers to 'appendix for further details'.
Hardware Specification Yes All experiments are run on a single NVIDIA TITAN X GPU with 12GB RAM.
Software Dependencies No This section shows the effectiveness of JFB using Py Torch (Paszke et al. 2017). The paper mentions PyTorch but does not specify its version number or any other software dependencies with versions.
Experiment Setup No The paper mentions using ResNet-based networks and details the use of the conjugate gradient (CG) method with a maximum number of iterations set to the maximum depth of forward propagation. However, it does not explicitly provide specific hyperparameters like learning rate, batch size, or optimizer settings within the main text, deferring 'further details' to the appendix.