reproducibilityindex.ai

Scaling-up Diverse Orthogonal Convolutional Networks by a Paraunitary Framework

Authors: Jiahao Su, Wonmin Byeon, Furong Huang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the experiments, we achieve the following goals. (1) We demonstrate in Section 6.1 that our separable complete factorization (SC-Fac) achieves precise orthogonality (up to machine-precision), resulting in more accurate orthogonal designs than previous ones (Sedghi et al., 2019; Li et al., 2019b; Trockman & Kolter, 2021). (2) Despite the differences in preciseness, we show in Section 6.2 that different realizations of paraunitary systems only have a minor impact on the adversarial robustness of Lipschitz networks. (3) Due to the versatility of our convolutional layers and architectures, in Section 6.3, we explore the best strategy to scale Lipschitz networks to wider/deeper architectures. (4) In Appendix F, we further demonstrate in a successful application of orthogonal convolutions in residual ﬂows (Chen et al., 2019). Training details are provided in Appendix E.1.
Researcher Affiliation	Collaboration	1 University of Maryland, College Park, MD USA 2 NVIDIA Research, NVIDIA Corporation, Santa Clara, CA USA
Pseudocode	Yes	We include the pseudo-code for separable complete factorization (Section 2) in Algorithm 1 and diverse orthogonal convolutions (Section 3) in Algorithm 2. The pseudo-code in Algorithm 1 consists of three parts: (1) First, we obtain orthogonal matrices from skew-symmetric matrices using matrix exponential. We use Geo Torch library (Lezcano Casado, 2019) for the function matrix_exp in our implementation; (2) Subsequently, we construct two 1D paraunitary systems using these orthogonal matrices; (3) Lastly, we compose two 1D paraunitary systems to obtain one 2D paraunitary systems The pseudo-code in Algorithm 2 consists of two parts: (1) First, we reshape each paraunitary system into an orthogonal convolution depending on the stride; and (5) second, we concatenate the orthogonal kernels for different groups and return the output.
Open Source Code	Yes	Our code will be publicly available at https://github.com/ umd-huang-lab/ortho-conv.
Open Datasets	Yes	We use the CIFAR-10 dataset for all our experiments. We normalize all input images to [0, 1] followed by standard augmentation, including random cropping and horizontal ﬂipping. We use the Adam optimizer with a maximum learning rate of 10 2 coupled with a piece-wise triangular learning rate scheduler. We initialize all our SC-Fac layers as permutation matrices: (1) we select the number of columns for each pair U (ℓ), U ( ℓ) uniformly from {1, , T} at initialization (the number is ﬁxed during training); (2) for ℓ> 0, we sample the entries in U (ℓ) uniformly with respect to the Haar measure; (3) for ℓ< 0, we set U ( ℓ) = QU (ℓ) according to Proposition D.1.
Dataset Splits	No	The paper states it uses CIFAR-10 and MNIST datasets, which have standard splits, but it does not explicitly specify the training/validation/test splits (e.g., percentages or sample counts) used for reproducibility.
Hardware Specification	Yes	Missing numbers in Figure 4 and Table 7 (Appendix E) are due to the large memory requirement (on Tesla V100 32G).
Software Dependencies	No	The paper mentions using 'Geo Torch library (Lezcano Casado, 2019)' but does not provide specific version numbers for this library or any other software dependencies (e.g., Python, PyTorch, CUDA).
Experiment Setup	Yes	We use the Adam optimizer with a maximum learning rate of 10 2 coupled with a piece-wise triangular learning rate scheduler. We initialize all our SC-Fac layers as permutation matrices: (1) we select the number of columns for each pair U (ℓ), U ( ℓ) uniformly from {1, , T} at initialization (the number is ﬁxed during training); (2) for ℓ> 0, we sample the entries in U (ℓ) uniformly with respect to the Haar measure; (3) for ℓ< 0, we set U ( ℓ) = QU (ℓ) according to Proposition D.1. For each model, we perform a grid search on different margins ϵ0 {1 10 3, 2 10 3, 5 10 3, 1 10 2, 2 10 2, 5 10 2, 0.1, 0.2, 0.5} and report the best performance in terms of robust accuracy.