Invertible Residual Networks

Authors: Jens Behrmann, Will Grathwohl, Ricky T. Q. Chen, David Duvenaud, Joern-Henrik Jacobsen

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical evaluation shows that invertible Res Nets perform competitively with both state-of-the-art image classifiers and flow-based generative models, something that has not been previously achieved with a single architecture.
Researcher Affiliation Academia 1University of Bremen, Center for Industrial Mathematics 2Vector Institute and University of Toronto.
Pseudocode Yes Algorithm 1. Inverse of i-Res Net layer via fixed-point iteration. ... Algorithm 2. Forward pass of an invertible Res Nets with Lipschitz constraint and log-determinant approximation, SN denotes spectral normalization based on (2).
Open Source Code Yes 1Official code release: https://github.com/jhjacobsen/invertible-resnet
Open Datasets Yes To compare the discriminative performance and invertibility of i-Res Nets with standard Res Net architectures, we train both models on CIFAR10, CIFAR100, and MNIST.
Dataset Splits No The paper mentions using CIFAR10, CIFAR100, and MNIST datasets but does not explicitly provide specific training/validation/test split percentages, sample counts, or refer to a cited standard split configuration for reproduction.
Hardware Specification Yes The runtime on 4 Ge Force GTX 1080 GPUs with 1 spectral norm iteration was 0.5 sec for a forward and backward pass of batch with 128 samples, while it took 0.2 sec without spectral normalization.
Software Dependencies No The paper mentions software components and methods like 'SGD with momentum', 'Adam or Adamax', 'ELU', 'softplus', and 'ReLU', but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes The CIFAR and MNIST models have 54 and 21 residual blocks, respectively and we use identical settings for all other hyperparameters. ... We are able to train i-Res Nets using SGD with momentum and a learning rate of 0.1 whereas all version of Glow we tested needed Adam or Adamax (Kingma & Ba, 2014) and much smaller learning rates to avoid divergence. ... To obtain the numerical inverse, we apply 100 fixed point iterations (Equation (1)) for each block. ... Compared to the classification model, the log-determinant approximation with 5 series terms roughly increased the computation times by a factor of 4.