Scaling-up Diverse Orthogonal Convolutional Networks by a Paraunitary Framework

Authors: Jiahao Su, Wonmin Byeon, Furong Huang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In the experiments, we achieve the following goals. (1) We demonstrate in Section 6.1 that our separable complete factorization (SC-Fac) achieves precise orthogonality (up to machine-precision), resulting in more accurate orthogonal designs than previous ones (Sedghi et al., 2019; Li et al., 2019b; Trockman & Kolter, 2021). (2) Despite the differences in preciseness, we show in Section 6.2 that different realizations of paraunitary systems only have a minor impact on the adversarial robustness of Lipschitz networks. (3) Due to the versatility of our convolutional layers and architectures, in Section 6.3, we explore the best strategy to scale Lipschitz networks to wider/deeper architectures. (4) In Appendix F, we further demonstrate in a successful application of orthogonal convolutions in residual flows (Chen et al., 2019). Training details are provided in Appendix E.1.
Researcher Affiliation Collaboration 1 University of Maryland, College Park, MD USA 2 NVIDIA Research, NVIDIA Corporation, Santa Clara, CA USA
Pseudocode Yes We include the pseudo-code for separable complete factorization (Section 2) in Algorithm 1 and diverse orthogonal convolutions (Section 3) in Algorithm 2. The pseudo-code in Algorithm 1 consists of three parts: (1) First, we obtain orthogonal matrices from skew-symmetric matrices using matrix exponential. We use Geo Torch library (Lezcano Casado, 2019) for the function matrix_exp in our implementation; (2) Subsequently, we construct two 1D paraunitary systems using these orthogonal matrices; (3) Lastly, we compose two 1D paraunitary systems to obtain one 2D paraunitary systems The pseudo-code in Algorithm 2 consists of two parts: (1) First, we reshape each paraunitary system into an orthogonal convolution depending on the stride; and (5) second, we concatenate the orthogonal kernels for different groups and return the output.
Open Source Code Yes Our code will be publicly available at https://github.com/ umd-huang-lab/ortho-conv.
Open Datasets Yes We use the CIFAR-10 dataset for all our experiments. We normalize all input images to [0, 1] followed by standard augmentation, including random cropping and horizontal flipping. We use the Adam optimizer with a maximum learning rate of 10 2 coupled with a piece-wise triangular learning rate scheduler. We initialize all our SC-Fac layers as permutation matrices: (1) we select the number of columns for each pair U (ℓ), U ( ℓ) uniformly from {1, , T} at initialization (the number is fixed during training); (2) for ℓ> 0, we sample the entries in U (ℓ) uniformly with respect to the Haar measure; (3) for ℓ< 0, we set U ( ℓ) = QU (ℓ) according to Proposition D.1.
Dataset Splits No The paper states it uses CIFAR-10 and MNIST datasets, which have standard splits, but it does not explicitly specify the training/validation/test splits (e.g., percentages or sample counts) used for reproducibility.
Hardware Specification Yes Missing numbers in Figure 4 and Table 7 (Appendix E) are due to the large memory requirement (on Tesla V100 32G).
Software Dependencies No The paper mentions using 'Geo Torch library (Lezcano Casado, 2019)' but does not provide specific version numbers for this library or any other software dependencies (e.g., Python, PyTorch, CUDA).
Experiment Setup Yes We use the Adam optimizer with a maximum learning rate of 10 2 coupled with a piece-wise triangular learning rate scheduler. We initialize all our SC-Fac layers as permutation matrices: (1) we select the number of columns for each pair U (ℓ), U ( ℓ) uniformly from {1, , T} at initialization (the number is fixed during training); (2) for ℓ> 0, we sample the entries in U (ℓ) uniformly with respect to the Haar measure; (3) for ℓ< 0, we set U ( ℓ) = QU (ℓ) according to Proposition D.1. For each model, we perform a grid search on different margins ϵ0 {1 10 3, 2 10 3, 5 10 3, 1 10 2, 2 10 2, 5 10 2, 0.1, 0.2, 0.5} and report the best performance in terms of robust accuracy.