Maximum Class Separation as Inductive Bias in One Matrix

Authors: Tejaswi Kasarla, Gertjan Burghouts, Max van Spengler, Elise van der Pol, Rita Cucchiara, Pascal Mettes

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that our proposal directly boosts classification, long-tailed recognition, out-of-distribution detection, and open-set recognition, from CIFAR to Image Net. We find empirically that maximum separation works best as a fixed bias; making the matrix learnable adds nothing to the performance. The closed-form matrices and code to reproduce the experiments are available on github.1
Researcher Affiliation Collaboration Tejaswi Kasarla University of Amsterdam Gertjan J. Burghouts TNO, Intelligent Imaging Max van Spengler University of Amsterdam Elise van der Pol Microsoft Research AI4Science Rita Cucchiara University of Modena and Reggio Emilia Pascal Mettes University of Amsterdam
Pseudocode No The paper describes a recursive algorithm for constructing the matrix but does not present it in a formally labeled pseudocode or algorithm block.
Open Source Code Yes The closed-form matrices and code to reproduce the experiments are available on github.1 1https://github.com/tkasarla/max-separation-as-inductive-bias
Open Datasets Yes CIFAR-100 CIFAR-10 0.2 0.1 0.02 0.01 0.2 0.1 0.02 0.01 ... we evaluate the potential of maximum separation as inductive bias in classification and long-tailed recognition settings using the CIFAR-100 and CIFAR-10 datasets along with their long-tailed variants [11].
Dataset Splits Yes In the first set of experiments, we evaluate the potential of maximum separation as inductive bias in classification and long-tailed recognition settings using the CIFAR-100 and CIFAR-10 datasets along with their long-tailed variants [11]. Our maximum separation is expected to improve learning especially when dealing with under-represented classes which will be separated by design with our proposal. We evaluate on a standard Conv Net and a Res Net-32 architecture with four imbalance factors: 0.2, 0.1, 0.02, and 0.01. We set ρ = 0.1 for Res Net-32 as this provides a minor improvement over ρ = 1. The results are shown in Table 1. We report the confidence intervals for these experiments in the supplementary materials.
Hardware Specification No The paper does not specify the hardware (e.g., GPU models, CPU types) used for the experiments.
Software Dependencies Yes All networks are optimized with SGD with a cosine annealing learning rate scheduler, initial learning rate of 0.1, momentum 0.9, and weight decay 5e-4. For Alex Net we use a step learning rate decay scheduler. All hyperparameters and experimental details are also available in the provided code. ... as provided by Py Torch [40].
Experiment Setup Yes Implementation details. Across our experiments, we train a range of network architectures including Alex Net [22], multiple Res Net architectures [17, 67], and VGG32 [47], as provided by Py Torch [40]. All networks are optimized with SGD with a cosine annealing learning rate scheduler, initial learning rate of 0.1, momentum 0.9, and weight decay 5e-4. For Alex Net we use a step learning rate decay scheduler. All hyperparameters and experimental details are also available in the provided code.