DISCOVER: Making Vision Networks Interpretable via Competition and Dissection

Authors: Konstantinos Panousis, Sotirios Chatzis

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental For evaluating and dissecting the proposed CVNs, we train two sets of models: (i) Transfomer-based, and (ii) Convolutional architectures. We consider stochastic LWTA layers with different numbers of competitors, ranging from U = 2 to U = 24. In every architecture, we retain the total number of parameters of the conventional model by splitting a layer comprising K singular neurons to B blocks of U competing neurons, such that B U = K.
Researcher Affiliation Academia Konstantinos Panousis Department of Electrical Eng., Computer Eng., and Informatics Cyprus University of Technology Limassol 3036, Cyprus k.panousis@cut.ac.cy Sotirios Chatzis Department of Electrical Eng., Computer Eng., and Informatics Cyprus University of Technology Limassol 3036, Cyprus sotirios.chatzis@cut.ac.cy
Pseudocode No The paper provides equations and a graphical illustration (Figure 1) but does not contain structured pseudocode or algorithm blocks labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code Yes Our code implementation is available at: https://github.com/konpanousis/DISCOVER.
Open Datasets Yes For the Transformer architecture, we select the Dei T model, specifically Dei T-Tiny (Dei T-T, 5M parameters) and Dei T-S (22M parameters), which we train from scratch on Image Net-1k. For the convolutional paradigm... Res Net-18 trained on Places365. The paper also refers to the 'Broden' dataset.
Dataset Splits Yes The paper refers to 'Image Net Val' and 'CIFAR100 Train' in Table 2 and 3, which implies the use of standard validation/training splits for these well-known datasets.
Hardware Specification Yes All models were trained on a single NVIDIA A6000 GPU.
Software Dependencies No The paper mentions using the 'timm library' and implementations from the 'Pytorch repository', and refers to the 'Gumbel-Softmax' trick, but it does not provide specific version numbers for these software dependencies (e.g., PyTorch version, timm version).
Experiment Setup Yes We train both architectures from scratch using Image Net-1k for 300 epochs with the default parameters found therein. Specifically, we use a 5-epoch warm-up period, starting with an initial learning rate of 10^-6, following a cosine annealing schedule up to 5x10^-4. We use the same Adam W optimizer and changed the used weight decay from 0.05 to 0.02... For training the Res Net-18 model... We train the model for 90 epochs, using SGD with an initial learning rate of 0.1 that is reduced by a factor of 0.1 every 30 epochs, a weight decay of 10^-4 and 0.9 momentum. The batch size was set to 256. For the Gumbel-Softmax trick, we set the temperature to 0.67 and used the Straight-Through estimator.