Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
DISCOVER: Making Vision Networks Interpretable via Competition and Dissection
Authors: Konstantinos Panousis, Sotirios Chatzis
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For evaluating and dissecting the proposed CVNs, we train two sets of models: (i) Transfomer-based, and (ii) Convolutional architectures. We consider stochastic LWTA layers with different numbers of competitors, ranging from U = 2 to U = 24. In every architecture, we retain the total number of parameters of the conventional model by splitting a layer comprising K singular neurons to B blocks of U competing neurons, such that B U = K. |
| Researcher Affiliation | Academia | Konstantinos Panousis Department of Electrical Eng., Computer Eng., and Informatics Cyprus University of Technology Limassol 3036, Cyprus EMAIL Sotirios Chatzis Department of Electrical Eng., Computer Eng., and Informatics Cyprus University of Technology Limassol 3036, Cyprus EMAIL |
| Pseudocode | No | The paper provides equations and a graphical illustration (Figure 1) but does not contain structured pseudocode or algorithm blocks labeled as 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | Our code implementation is available at: https://github.com/konpanousis/DISCOVER. |
| Open Datasets | Yes | For the Transformer architecture, we select the Dei T model, specifically Dei T-Tiny (Dei T-T, 5M parameters) and Dei T-S (22M parameters), which we train from scratch on Image Net-1k. For the convolutional paradigm... Res Net-18 trained on Places365. The paper also refers to the 'Broden' dataset. |
| Dataset Splits | Yes | The paper refers to 'Image Net Val' and 'CIFAR100 Train' in Table 2 and 3, which implies the use of standard validation/training splits for these well-known datasets. |
| Hardware Specification | Yes | All models were trained on a single NVIDIA A6000 GPU. |
| Software Dependencies | No | The paper mentions using the 'timm library' and implementations from the 'Pytorch repository', and refers to the 'Gumbel-Softmax' trick, but it does not provide specific version numbers for these software dependencies (e.g., PyTorch version, timm version). |
| Experiment Setup | Yes | We train both architectures from scratch using Image Net-1k for 300 epochs with the default parameters found therein. Specifically, we use a 5-epoch warm-up period, starting with an initial learning rate of 10^-6, following a cosine annealing schedule up to 5x10^-4. We use the same Adam W optimizer and changed the used weight decay from 0.05 to 0.02... For training the Res Net-18 model... We train the model for 90 epochs, using SGD with an initial learning rate of 0.1 that is reduced by a factor of 0.1 every 30 epochs, a weight decay of 10^-4 and 0.9 momentum. The batch size was set to 256. For the Gumbel-Softmax trick, we set the temperature to 0.67 and used the Straight-Through estimator. |