Backward-Compatible Prediction Updates: A Probabilistic Approach

Authors: Frederik Träuble, Julius von Kügelgen, Matthäus Kleindessner, Francesco Locatello, Bernhard Schölkopf, Peter Gehler

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In extensive experiments on standard classification benchmark data sets, we show that our method outperforms alternative strategies along key metrics for backward-compatible prediction updates. (Abstract) and 6 Experiments We now evaluate our Bayesian approach to the Prediction Update Problem against different baselines using the task of image classification as a case study. (Section 6 start)
Researcher Affiliation Collaboration 1Max Planck Institute for Intelligent Systems, T ubingen, Germany 2Amazon T ubingen, Germany 3Department of Engineering, University of Cambridge, United Kingdom
Pseudocode No No pseudocode or clearly labeled algorithm block found.
Open Source Code No No concrete access to source code for the methodology described in this paper is provided. Footnote 3 states: 'All software and assets we use are open source and under MIT, Apache or Creative Commons Licenses.', which refers to third-party software used, not their own implementation.
Open Datasets Yes Data Sets We use the three widely accepted benchmark data sets Image Net1K [7] (1K classes, 50k validation set), Object Net [2] (313 classes, 50k validation set) and CIFAR-10 [21] (10 classes, 10k validation set).
Dataset Splits Yes Data Sets We use the three widely accepted benchmark data sets Image Net1K [7] (1K classes, 50k validation set) and CIFAR-10 [21] (10 classes, 10k validation set). ... For the former, we split the Image Net validation set in half and use one half to estimate πt and the other as Dtarg.
Hardware Specification No No specific hardware details (like GPU/CPU models, memory amounts, or detailed computer specifications) are provided for running the experiments. The paper only mentions relative computational costs of models without specifying the underlying hardware.
Software Dependencies No The paper mentions using 'torchvision model zoo [29]' and 'PyTorch [29]' but does not provide specific version numbers for PyTorch or any other software dependencies. Reference [32] mentions 'pytorch cifar10, jan 2021', but this isn't a version number for a dependency.
Experiment Setup Yes We report the following metrics: (i) final accuracy of the stored predictions (Acc) and accuracy improvement over the initial accuracy of C0 ( Acc); (ii) the cumulative number of negative flips from time t = 0 to T (Σ NF), the average negative flip rate experienced per iteration, i.e., Σ NF N T (NFR), and the ratio of accumulated positive to negative flips (PF / NF); (iii) the evaluation budget available to each strategy as percentage of the data set size, i.e., a budget of 10 means that 10% of all samples can be re-evaluated at each time step: Bt = 0.1N, t; finally, we measure the connective backward compatibility between (i) and (ii) via Backward Trust Compatibility (BTC) and Backward Error Compatibility (BEC) [43]. We refer to Appendix A.2 for a formal definition of these scores. and We use the predictionupdate strategies MB, MBME and CR from 3.3 and consider cost ratios of |c NF/c PF| {2, 5, 10} for the latter (e.g., CR 2).