Approximation Capabilities of Neural ODEs and Invertible Residual Networks

Authors: Han Zhang, Xi Gao, Jacob Unterman, Tom Arodz

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We tested whether i-Res Net operating in one dimension can learn to perform the x x mapping, and whether adding one more dimension has impact on the ability learn the mapping. To this end, we constructed a network with five residual blocks. In each block, the residual mapping is a single linear transformation, that is, the residual block is xt+1 = xt + Wxt. We used the official i-Res Net Py Torch package (Behrmann et al., 2019) that relies on spectral normalization (Miyato et al., 2018) to limit the Lipschitz constant to less than unity. We trained the network on a set of 10,000 randomly generated values of x uniformly distributed in [ 10, 10] for 100 epochs, and used an independent test set of 2,000 samples generated similarly. For the one-dimensional x x and the two-dimensional [x, 0] [ x, 0] target mapping, we used MSE as the loss. Adding one extra dimension results in successful learning of the mapping, confirming Theorem 6. The test MSE on each output is below 10 10; the network learned to negate x, and to bring the additional dimension back to null, allowing for invertibility of the model. For the i-Res Net operating in the original, one-dimensional space, learning is not successful (MSE of 33.39), the network learned a mapping x cx for a small positive c, that is, the mapping closest to negation of x that can be achieved while keeping non-intersecting paths, confirming experimentally Corollary 4.
Researcher Affiliation Academia 1Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, USA. Correspondence to: Tom Arodz <tarodz@vcu.edu>.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions using "the official i-Res Net Py Torch package (Behrmann et al., 2019)" and "torchdiffeq package (Chen et al., 2018)", but does not provide concrete access to its own source code for the methodology described.
Open Datasets Yes We used the CIFAR10 dataset (Krizhevsky, 2009) that consists of 32 x 32 RGB images, that is, each input image has dimensionality of p = 32 32 3.
Dataset Splits Yes We trained the network on a set of 10,000 randomly generated values of x uniformly distributed in [ 10, 10] for 100 epochs, and used an independent test set of 2,000 samples generated similarly.
Hardware Specification Yes We used torchdiffeq package (Chen et al., 2018) and trained on a single NVIDIA Tesla V100 GPU.
Software Dependencies No The paper mentions using "the official i-Res Net Py Torch package" and "torchdiffeq package" but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes In each block, the residual mapping is a single linear transformation, that is, the residual block is xt+1 = xt + Wxt. We trained the network on a set of 10,000 randomly generated values of x uniformly distributed in [ 10, 10] for 100 epochs, and used an independent test set of 2,000 samples generated similarly. We used MSE as the loss. In designing the architecture of the neural network underlying the ODE we followed ANODE (Dupont et al., 2019). Briefly, the network is composed of three 2D convolutional layers. The first two convolutional layers use k filters, and the last one uses the number of input channels as the number of filters, to ensure that the dimensionalities of the input and output of the network match. The convolution stack is followed by a Re LU activation function. A linear layer, with softmax activation and cross-entropy loss, operates on top the ODE block.