reproducibilityindex.ai

Towards Anytime Classification in Early-Exit Architectures by Enforcing Conditional Monotonicity

Authors: Metod Jazbec, James Allingham, Dan Zhang, Eric Nalisnick

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical results on standard image-classification tasks demonstrate that such behaviors can be achieved while preserving competitive accuracy on average. We conduct two sets of experiments. First, in Section 6.1, we verify that our method (PA) maintains strong average performance while significantly improving conditional monotonicity in state-of-the-art EENNs making them more suitable for the anytime prediction task.
Researcher Affiliation	Collaboration	Metod Jazbec Uv A-Bosch Delta Lab University of Amsterdam m.jazbec@uva.nl James Urquhart Allingham University of Cambridge jua23@cam.ac.uk Dan Zhang Bosch Center for AI & University of Tübingen dan.zhang2@de.bosch.com Eric Nalisnick Uv A-Bosch Delta Lab University of Amsterdam e.t.nalisnick@uva.nl
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is publicly available at https://github.com/metodj/Anytime Classification.
Open Datasets	Yes	We consider CIFAR-10, CIFAR-100 [Krizhevsky et al., 2009], and ILSVRC 2012 (Image Net; Deng et al. [2009]).
Dataset Splits	Yes	The CIFAR datasets each contain 50k training and 10k test images, while Image Net is a larger dataset with 1.2M training and 50k test instances. Note that the baseline results in this figure differ slightly from those in Figures 3 and 4, as less data (90%) was used to fit the model, with a portion of the training dataset (10%) used to fit the thresholding model τ.
Hardware Specification	No	The paper does not specify any hardware details (e.g., GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies	No	The paper mentions models like BERT and implies the use of common deep learning frameworks but does not specify any software dependencies with version numbers (e.g., PyTorch 1.9, Python 3.8).
Experiment Setup	Yes	The weights are most often set to be uniform across all exits. We train with EENN objective for the first 2/3 of epochs and with PA objective for the last 1/3 of epochs. we substitute the Re LU activation function with a Softplus function and we found that using ensemble weights wm = 1 is more effective for models finetuned with the PA objective, as opposed to our default choice of wm = m/M presented in the main paper for post-hoc PA.