Selectivity considered harmful: evaluating the causal impact of class selectivity in DNNs

Authors: Matthew L Leavitt, Ari S. Morcos

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We investigated the causal impact of class selectivity on network function by directly regularizing for or against class selectivity. Using this regularizer to reduce class selectivity across units in convolutional neural networks increased test accuracy by over 2% in Res Net18 and 1% in Res Net50 trained on Tiny Image Net.
Researcher Affiliation Industry Matthew L. Leavitt , Ari S. Morcos Facebook AI Research Menlo Park, CA, USA {ito,arimorcos}@fb.com
Pseudocode No The paper does not contain any sections or figures explicitly labeled as 'Pseudocode' or 'Algorithm', nor does it include structured algorithm blocks.
Open Source Code No The paper mentions using a third-party PWCCA implementation from https://github.com/google/svcca/, but does not provide an explicit statement or link for the open-sourcing of their own methodology's code.
Open Datasets Yes Our experiments were performed on Res Net18 and Res Net50 (He et al., 2016) trained on Tiny Image Net (Fei-Fei et al., 2015), and Res Net20 (He et al., 2016) and a VGG16-like network (Simonyan and Zisserman, 2015), both trained on CIFAR10 (Krizhevsky, 2009).
Dataset Splits Yes Tiny Imagenet (Fei-Fei et al., 2015) consists of 500 training images and 50 images for each of its 200 classes. We used the validation set for testing and created a new validation set by taking 50 images per class from the training set, selected randomly for each training run. We split the 50k CIFAR10 training samples into a 45k sample training set and a 5k validation set, similar to our approach with Tiny Imagenet.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, or cloud computing instances with specifications) used for running the experiments.
Software Dependencies No The paper mentions using Py Torch (Paszke et al., 2019), Sci Py (Virtanen et al., 2019), and Seaborn (Waskom et al., 2017), but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes All models were trained using stochastic gradient descent (SGD) with momentum = 0.9 and weight decay = 0.0001. ... Res Net18 were trained for 90 epochs with a minibatch size of 4096 samples with a learning rate of 0.1, multiplied (annealed) by 0.1 at epochs 35, 50, 65, and 80. Res Net50 was trained identically, except with a batch size of 1400 samples. ... Res Net20 and VGG16 were trained for 200 epochs using a minibatch size of 256 samples. Res Net20 were trained with a learning rate of 0.1 and VGG16 with a learning rate of 0.01, both annealed by 10^-1 at epochs 100 and 150.