Maximum Class Separation as Inductive Bias in One Matrix
Authors: Tejaswi Kasarla, Gertjan Burghouts, Max van Spengler, Elise van der Pol, Rita Cucchiara, Pascal Mettes
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that our proposal directly boosts classification, long-tailed recognition, out-of-distribution detection, and open-set recognition, from CIFAR to Image Net. We find empirically that maximum separation works best as a fixed bias; making the matrix learnable adds nothing to the performance. The closed-form matrices and code to reproduce the experiments are available on github.1 |
| Researcher Affiliation | Collaboration | Tejaswi Kasarla University of Amsterdam Gertjan J. Burghouts TNO, Intelligent Imaging Max van Spengler University of Amsterdam Elise van der Pol Microsoft Research AI4Science Rita Cucchiara University of Modena and Reggio Emilia Pascal Mettes University of Amsterdam |
| Pseudocode | No | The paper describes a recursive algorithm for constructing the matrix but does not present it in a formally labeled pseudocode or algorithm block. |
| Open Source Code | Yes | The closed-form matrices and code to reproduce the experiments are available on github.1 1https://github.com/tkasarla/max-separation-as-inductive-bias |
| Open Datasets | Yes | CIFAR-100 CIFAR-10 0.2 0.1 0.02 0.01 0.2 0.1 0.02 0.01 ... we evaluate the potential of maximum separation as inductive bias in classification and long-tailed recognition settings using the CIFAR-100 and CIFAR-10 datasets along with their long-tailed variants [11]. |
| Dataset Splits | Yes | In the first set of experiments, we evaluate the potential of maximum separation as inductive bias in classification and long-tailed recognition settings using the CIFAR-100 and CIFAR-10 datasets along with their long-tailed variants [11]. Our maximum separation is expected to improve learning especially when dealing with under-represented classes which will be separated by design with our proposal. We evaluate on a standard Conv Net and a Res Net-32 architecture with four imbalance factors: 0.2, 0.1, 0.02, and 0.01. We set ρ = 0.1 for Res Net-32 as this provides a minor improvement over ρ = 1. The results are shown in Table 1. We report the confidence intervals for these experiments in the supplementary materials. |
| Hardware Specification | No | The paper does not specify the hardware (e.g., GPU models, CPU types) used for the experiments. |
| Software Dependencies | Yes | All networks are optimized with SGD with a cosine annealing learning rate scheduler, initial learning rate of 0.1, momentum 0.9, and weight decay 5e-4. For Alex Net we use a step learning rate decay scheduler. All hyperparameters and experimental details are also available in the provided code. ... as provided by Py Torch [40]. |
| Experiment Setup | Yes | Implementation details. Across our experiments, we train a range of network architectures including Alex Net [22], multiple Res Net architectures [17, 67], and VGG32 [47], as provided by Py Torch [40]. All networks are optimized with SGD with a cosine annealing learning rate scheduler, initial learning rate of 0.1, momentum 0.9, and weight decay 5e-4. For Alex Net we use a step learning rate decay scheduler. All hyperparameters and experimental details are also available in the provided code. |