Backpropagation-Friendly Eigendecomposition

Authors: Wei Wang, Zheng Dang, Yinlin Hu, Pascal Fua, Mathieu Salzmann

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In all the experiments below, we use either Resnet18 or Resnet50 [19] as our backbone. We retain their original architectures but introduce an additional layer between the first convolutional layer and the first pooling layer. For both ZCA and PCA, the new layer computes the covariance matrix of the feature vectors, eigendecomposes it, and uses the eigenvalues and eigenvectors as described below.
Researcher Affiliation Academia Wei Wang1, Zheng Dang2, Yinlin Hu1, Pascal Fua1, and Mathieu Salzmann1 1CVLab, EPFL, CH-1015 Lausanne, Switzerland {first.last}@epfl.ch 2Xi an Jiaotong University, China {dangzheng713@stu.xjtu.edu.cn}
Pseudocode Yes Algorithm 1: Forward Pass of ZCA whitening in Practice.
Open Source Code Yes The code is available at https://github.com/WeiWangTrento/Power-Iteration-SVD.
Open Datasets Yes We first use CIFAR-10 [20] to compare the behavior of our approach with that of standard SVD and PI for different number of groups G...We report equivalent results in Table 3 on CIFAR-100 using either Res Net18 or Res Net50 as the backbone.
Dataset Splits No The paper reports training errors and test errors, but does not explicitly detail the split percentages or sizes for training, validation, and test sets. It mentions 'mini-batches' and 'batchsize 128' but not explicit validation set details.
Hardware Specification Yes In practice, on one single Titan XP GPU server, for one minibatch with batchsize 128, using Res Net18 as backbone, 2 power iterations take 104.8 ms vs 82.7ms for batch normalization.
Software Dependencies No Note that we implemented our method in Pytorch [21], with the backward pass written in python, which leaves room for improvement. While PyTorch is mentioned, a specific version number is not provided.
Experiment Setup Yes As discussed in Section 2.4, we use K = 19 power iterations when backprogating the gradients unless otherwise specified. To accommodate this additional processing, we change the stride s and kernel sizes in the subsequent blocks to Conv1(3 3,s=1)->Block1(s=1)->Block2(s=2)->Block3(s=2)>Block4(s=2)->Avg Pool(4 4)->FC.