reproducibilityindex.ai

Backward-Compatible Prediction Updates: A Probabilistic Approach

Authors: Frederik Träuble, Julius von Kügelgen, Matthäus Kleindessner, Francesco Locatello, Bernhard Schölkopf, Peter Gehler

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In extensive experiments on standard classiﬁcation benchmark data sets, we show that our method outperforms alternative strategies along key metrics for backward-compatible prediction updates. (Abstract) and 6 Experiments We now evaluate our Bayesian approach to the Prediction Update Problem against different baselines using the task of image classiﬁcation as a case study. (Section 6 start)
Researcher Affiliation	Collaboration	1Max Planck Institute for Intelligent Systems, T ubingen, Germany 2Amazon T ubingen, Germany 3Department of Engineering, University of Cambridge, United Kingdom
Pseudocode	No	No pseudocode or clearly labeled algorithm block found.
Open Source Code	No	No concrete access to source code for the methodology described in this paper is provided. Footnote 3 states: 'All software and assets we use are open source and under MIT, Apache or Creative Commons Licenses.', which refers to third-party software used, not their own implementation.
Open Datasets	Yes	Data Sets We use the three widely accepted benchmark data sets Image Net1K [7] (1K classes, 50k validation set), Object Net [2] (313 classes, 50k validation set) and CIFAR-10 [21] (10 classes, 10k validation set).
Dataset Splits	Yes	Data Sets We use the three widely accepted benchmark data sets Image Net1K [7] (1K classes, 50k validation set) and CIFAR-10 [21] (10 classes, 10k validation set). ... For the former, we split the Image Net validation set in half and use one half to estimate πt and the other as Dtarg.
Hardware Specification	No	No specific hardware details (like GPU/CPU models, memory amounts, or detailed computer specifications) are provided for running the experiments. The paper only mentions relative computational costs of models without specifying the underlying hardware.
Software Dependencies	No	The paper mentions using 'torchvision model zoo [29]' and 'PyTorch [29]' but does not provide specific version numbers for PyTorch or any other software dependencies. Reference [32] mentions 'pytorch cifar10, jan 2021', but this isn't a version number for a dependency.
Experiment Setup	Yes	We report the following metrics: (i) ﬁnal accuracy of the stored predictions (Acc) and accuracy improvement over the initial accuracy of C0 ( Acc); (ii) the cumulative number of negative ﬂips from time t = 0 to T (Σ NF), the average negative ﬂip rate experienced per iteration, i.e., Σ NF N T (NFR), and the ratio of accumulated positive to negative ﬂips (PF / NF); (iii) the evaluation budget available to each strategy as percentage of the data set size, i.e., a budget of 10 means that 10% of all samples can be re-evaluated at each time step: Bt = 0.1N, t; ﬁnally, we measure the connective backward compatibility between (i) and (ii) via Backward Trust Compatibility (BTC) and Backward Error Compatibility (BEC) [43]. We refer to Appendix A.2 for a formal deﬁnition of these scores. and We use the predictionupdate strategies MB, MBME and CR from 3.3 and consider cost ratios of \|c NF/c PF\| {2, 5, 10} for the latter (e.g., CR 2).