Backward-Compatible Prediction Updates: A Probabilistic Approach
Authors: Frederik Träuble, Julius von Kügelgen, Matthäus Kleindessner, Francesco Locatello, Bernhard Schölkopf, Peter Gehler
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In extensive experiments on standard classification benchmark data sets, we show that our method outperforms alternative strategies along key metrics for backward-compatible prediction updates. (Abstract) and 6 Experiments We now evaluate our Bayesian approach to the Prediction Update Problem against different baselines using the task of image classification as a case study. (Section 6 start) |
| Researcher Affiliation | Collaboration | 1Max Planck Institute for Intelligent Systems, T ubingen, Germany 2Amazon T ubingen, Germany 3Department of Engineering, University of Cambridge, United Kingdom |
| Pseudocode | No | No pseudocode or clearly labeled algorithm block found. |
| Open Source Code | No | No concrete access to source code for the methodology described in this paper is provided. Footnote 3 states: 'All software and assets we use are open source and under MIT, Apache or Creative Commons Licenses.', which refers to third-party software used, not their own implementation. |
| Open Datasets | Yes | Data Sets We use the three widely accepted benchmark data sets Image Net1K [7] (1K classes, 50k validation set), Object Net [2] (313 classes, 50k validation set) and CIFAR-10 [21] (10 classes, 10k validation set). |
| Dataset Splits | Yes | Data Sets We use the three widely accepted benchmark data sets Image Net1K [7] (1K classes, 50k validation set) and CIFAR-10 [21] (10 classes, 10k validation set). ... For the former, we split the Image Net validation set in half and use one half to estimate πt and the other as Dtarg. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, memory amounts, or detailed computer specifications) are provided for running the experiments. The paper only mentions relative computational costs of models without specifying the underlying hardware. |
| Software Dependencies | No | The paper mentions using 'torchvision model zoo [29]' and 'PyTorch [29]' but does not provide specific version numbers for PyTorch or any other software dependencies. Reference [32] mentions 'pytorch cifar10, jan 2021', but this isn't a version number for a dependency. |
| Experiment Setup | Yes | We report the following metrics: (i) final accuracy of the stored predictions (Acc) and accuracy improvement over the initial accuracy of C0 ( Acc); (ii) the cumulative number of negative flips from time t = 0 to T (Σ NF), the average negative flip rate experienced per iteration, i.e., Σ NF N T (NFR), and the ratio of accumulated positive to negative flips (PF / NF); (iii) the evaluation budget available to each strategy as percentage of the data set size, i.e., a budget of 10 means that 10% of all samples can be re-evaluated at each time step: Bt = 0.1N, t; finally, we measure the connective backward compatibility between (i) and (ii) via Backward Trust Compatibility (BTC) and Backward Error Compatibility (BEC) [43]. We refer to Appendix A.2 for a formal definition of these scores. and We use the predictionupdate strategies MB, MBME and CR from 3.3 and consider cost ratios of |c NF/c PF| {2, 5, 10} for the latter (e.g., CR 2). |