Backpropagation-Friendly Eigendecomposition
Authors: Wei Wang, Zheng Dang, Yinlin Hu, Pascal Fua, Mathieu Salzmann
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In all the experiments below, we use either Resnet18 or Resnet50 [19] as our backbone. We retain their original architectures but introduce an additional layer between the first convolutional layer and the first pooling layer. For both ZCA and PCA, the new layer computes the covariance matrix of the feature vectors, eigendecomposes it, and uses the eigenvalues and eigenvectors as described below. |
| Researcher Affiliation | Academia | Wei Wang1, Zheng Dang2, Yinlin Hu1, Pascal Fua1, and Mathieu Salzmann1 1CVLab, EPFL, CH-1015 Lausanne, Switzerland {first.last}@epfl.ch 2Xi an Jiaotong University, China {dangzheng713@stu.xjtu.edu.cn} |
| Pseudocode | Yes | Algorithm 1: Forward Pass of ZCA whitening in Practice. |
| Open Source Code | Yes | The code is available at https://github.com/WeiWangTrento/Power-Iteration-SVD. |
| Open Datasets | Yes | We first use CIFAR-10 [20] to compare the behavior of our approach with that of standard SVD and PI for different number of groups G...We report equivalent results in Table 3 on CIFAR-100 using either Res Net18 or Res Net50 as the backbone. |
| Dataset Splits | No | The paper reports training errors and test errors, but does not explicitly detail the split percentages or sizes for training, validation, and test sets. It mentions 'mini-batches' and 'batchsize 128' but not explicit validation set details. |
| Hardware Specification | Yes | In practice, on one single Titan XP GPU server, for one minibatch with batchsize 128, using Res Net18 as backbone, 2 power iterations take 104.8 ms vs 82.7ms for batch normalization. |
| Software Dependencies | No | Note that we implemented our method in Pytorch [21], with the backward pass written in python, which leaves room for improvement. While PyTorch is mentioned, a specific version number is not provided. |
| Experiment Setup | Yes | As discussed in Section 2.4, we use K = 19 power iterations when backprogating the gradients unless otherwise specified. To accommodate this additional processing, we change the stride s and kernel sizes in the subsequent blocks to Conv1(3 3,s=1)->Block1(s=1)->Block2(s=2)->Block3(s=2)>Block4(s=2)->Avg Pool(4 4)->FC. |