Implicit Bias of Linear Equivariant Networks
Authors: Hannah Lawrence, Kristian Georgiev, Andrew Dienes, Bobak T. Kiani
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first experimentally confirm our theory in a simple setting illustrating the effects of implicit regularization. Then, we relax the crucial assumption of linearity in our setup, to empirically show that the results may hold locally even in nonlinear settings (including the practical case of spherical CNNs (Cohen et al., 2018)). |
| Researcher Affiliation | Academia | 1Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available here: https://github.com/kristian-georgiev/implicit-bias-of-linear-equivariant-networks |
| Open Datasets | Yes | Figure 4 shows the implicit bias for a G-CNN over the non-abelian group (C28 C28) D8 which acts on images (the digits 1 and 5) from the MNIST dataset. |
| Dataset Splits | No | The paper mentions training data and epochs ('We evaluate the implicit bias of a nonlinear G-CNN (with linear final layer) on the dihedral group D60 with synthetic data. ... we only analyze loss on data in the training set.'), but does not provide specific details on train/validation/test dataset splits, percentages, or absolute sample counts for reproduction. |
| Hardware Specification | No | The paper states: 'The computational resources used are modest commodity hardware should suffice to fully reproduce our results.' This is a general statement and does not provide specific hardware details such as GPU or CPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions using the 'e2cnn package (Weiler and Cesa, 2019)' in Appendix E. However, it does not specify a version number for this or any other software dependency. |
| Experiment Setup | Yes | For all binary classification tasks, we use three-layer networks with inputs and convolution weights in R|G|, and all plots begin at epoch 1. ... We choose an appropriate learning rate for each task depending on the dimension and magnitude of the values. Due to the choice of the exponential loss as our loss function, we sometimes periodically increased the learning rate since gradients decay over time to speed up convergence. ... The weights are initialized with the standard uniform initialization. ... The architecture is trained via stochastic gradient descent on the cross-entropy loss... |