Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
AdaFocal: Calibration-aware Adaptive Focal Loss
Authors: Arindam Ghosh, Thomas Schaaf, Matthew Gormley
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Ada Focal on various image recognition and one NLP task, covering a wide variety of network architectures, to confirm the improvement in calibration while achieving similar levels of accuracy. Additionally, we show that models trained with Ada Focal achieve a significant boost in out-of-distribution detection. |
| Researcher Affiliation | Collaboration | Arindam Ghosh 3M Health Info. Systems Pittsburgh, PA 15217 EMAIL Thomas Schaaf 3M Health Info. Systems Pittsburgh, PA 15217 EMAIL Matt Gormley Carnegie Mellon University Pittsburgh, PA 15213 EMAIL |
| Pseudocode | Yes | Algorithm 1: Ada Focal |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] As part of the supplementary material and details are mentioned in Appendix D. |
| Open Datasets | Yes | We evaluate the performance of our proposed method on image and text classification tasks. For image classification, we use CIFAR-10, CIFAR-100 [9], Tiny-Image Net [2], and Image Net [27]... For text classification, we use the 20 Newsgroup dataset [14]. |
| Dataset Splits | Yes | We further assume access to a validation set for hyper-parameter tuning and a test set for evaluation. We experimented with Ada Focal using 5, 10, 15, 20, 30, and 50 equal-mass bins during training to draw calibration statistics form the validation set. Therefore, we use 15 bins for all Ada Focal trainings. |
| Hardware Specification | No | The main paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory) used for the experiments. It states in the checklist that this information is in Appendix D, but Appendix D content is not provided within the given text. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, specific libraries or compilers). It mentions general model types like CNN and BERT but not the software stack used to implement and run them. |
| Experiment Setup | Yes | If not stated explicitly, we use Sth = 0.2 for all Ada Focal experiments. λ is redundant and one may choose to ignore it as for all our experiments λ = 1 worked very well. For all our experiments, we use γmax = 20... γmin = 2 is selected... Therefore, we use 15 bins for all Ada Focal trainings. |