Invertible Residual Networks
Authors: Jens Behrmann, Will Grathwohl, Ricky T. Q. Chen, David Duvenaud, Joern-Henrik Jacobsen
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evaluation shows that invertible Res Nets perform competitively with both state-of-the-art image classifiers and flow-based generative models, something that has not been previously achieved with a single architecture. |
| Researcher Affiliation | Academia | 1University of Bremen, Center for Industrial Mathematics 2Vector Institute and University of Toronto. |
| Pseudocode | Yes | Algorithm 1. Inverse of i-Res Net layer via fixed-point iteration. ... Algorithm 2. Forward pass of an invertible Res Nets with Lipschitz constraint and log-determinant approximation, SN denotes spectral normalization based on (2). |
| Open Source Code | Yes | 1Official code release: https://github.com/jhjacobsen/invertible-resnet |
| Open Datasets | Yes | To compare the discriminative performance and invertibility of i-Res Nets with standard Res Net architectures, we train both models on CIFAR10, CIFAR100, and MNIST. |
| Dataset Splits | No | The paper mentions using CIFAR10, CIFAR100, and MNIST datasets but does not explicitly provide specific training/validation/test split percentages, sample counts, or refer to a cited standard split configuration for reproduction. |
| Hardware Specification | Yes | The runtime on 4 Ge Force GTX 1080 GPUs with 1 spectral norm iteration was 0.5 sec for a forward and backward pass of batch with 128 samples, while it took 0.2 sec without spectral normalization. |
| Software Dependencies | No | The paper mentions software components and methods like 'SGD with momentum', 'Adam or Adamax', 'ELU', 'softplus', and 'ReLU', but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | The CIFAR and MNIST models have 54 and 21 residual blocks, respectively and we use identical settings for all other hyperparameters. ... We are able to train i-Res Nets using SGD with momentum and a learning rate of 0.1 whereas all version of Glow we tested needed Adam or Adamax (Kingma & Ba, 2014) and much smaller learning rates to avoid divergence. ... To obtain the numerical inverse, we apply 100 fixed point iterations (Equation (1)) for each block. ... Compared to the classification model, the log-determinant approximation with 5 series terms roughly increased the computation times by a factor of 4. |