Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Towards Better Selective Classification
Authors: Leo Feng, Mohamed Osama Ahmed, Hossein Hajimirsadeghi, Amir H. Abdi
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The results suggest that the superior performance of state-of-the-art methods is owed to training a more generalizable classifier rather than their proposed selection mechanisms. Our proposed selection mechanism with the proposed entropy-based regularizer achieves new state-of-the-art results. 5 EXPERIMENTS For the following experiments, we evaluate the following state-of-the-art methods (1) Selective Net (SN), (2) Self-Adaptive Training (SAT), and (3) Deep Gamblers. |
| Researcher Affiliation | Collaboration | Leo Feng Mila Université de Montréal & Borealis AI EMAIL Mohamed Osama Ahmed Borealis AI EMAIL Hossein Hajimirsadeghi Borealis AI EMAIL Amir Abdi Borealis AI EMAIL |
| Pseudocode | No | The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm', nor are there any structured code-like blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/Borealis AI/towards-better-sel-cls. |
| Open Datasets | Yes | We introduce new datasets: Stanford Cars, Food101, Imagenet, Imagenet100 and Imagenet Subset, for the selective classification problem setting and benchmark the existing state-of-the-art methods. Imagenet (Deng et al., 2009) Food101. The Food dataset (Bossard et al., 2014) Stanford Cars. The Cars dataset (Krause et al., 2013) CIFAR-10. The CIFAR-10 dataset (Krizhevsky, 2009) |
| Dataset Splits | Yes | For hyperparameter tuning, we split Imagenet100 s training data into 80% training data and 20% validation data evenly across the different classes. |
| Hardware Specification | Yes | The experiments were primarily run on a GTX 1080 Ti. |
| Software Dependencies | No | The paper mentions 'Pytorch implementation' but does not specify a version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | For hyperparameter tuning, we split Imagenet100 s training data into 80% training data and 20% validation data evenly across the different classes. We tested the following values for the entropy minimization coefficient β {0.1, 0.01, 0.001, 0.0001}. ... Self-Adaptive Training models are trained using SGD with an initial learning rate of 0.1 and a momentum of 0.9. Food101/Imagenet100/Imagenet Subset. The models were trained for 500 epochs with a mini-batch size of 128. The learning rate was reduced by 0.5 every 25 epochs. The entropy-minimization term was β = 0.01. |