Source-Free Adaptation to Measurement Shift via Bottom-Up Feature Restoration

Authors: Cian Eastwood, Ian Mason, Chris Williams, Bernhard Schölkopf

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On real and synthetic data, we demonstrate that BUFR outperforms existing SFDA methods in terms of accuracy, calibration, and data efficiency, while being less reliant on the performance of the source model in the target domain. In this section we evaluate our methods on multiple datasets (shown in Appendix F), compare to various baselines, and provide insights into why our method works through a detailed analysis. Table 1 reports classification accuracies and ECEs for EMNIST-DA.
Researcher Affiliation Academia Cian Eastwood Ian Mason Christopher K. I. Williams Bernhard Schölkopf School of Informatics, University of Edinburgh Alan Turing Institute, London MPI for Intelligent Systems, Tübingen
Pseudocode Yes Algorithm 1 gives the algorithm for FR at development time, where a source model is trained before saving approximations of the feature and logit distributions under the source data. Algorithm 2 gives the algorithm for FR at deployment time, where the feature-extractor is adapted such that the approximate feature and logit distributions under the target data realign with those saved on the source.
Open Source Code Yes Code is available at https://github.com/cianeastwood/bufr.
Open Datasets Yes Datasets and implementation. Early experiments on MNIST-M (Ganin et al., 2016) and MNISTC (Mu & Gilmer, 2019)... Thus, we additionally create and release EMNIST-DA a domain adaptation (DA) dataset based on the 47-class Extended MNIST (EMNIST) character-recognition dataset (Cohen et al., 2017). We also evaluate on object recognition with CIFAR-C and CIFAR-C (Hendrycks & Dietterich, 2019), and on real-world measurement shifts with CAMELYON (Bandi et al., 2018).
Dataset Splits Yes target-supervised is an upper-bound that uses labelled target data (we use a 80-10-10 training-validation-test split, reporting accuracy on the test set). In line with previous UDA & SFDA works (although often not made explicit), we use a test-domain validation set for model selection (Gulrajani & Lopez-Paz, 2021).
Hardware Specification No No specific hardware details (e.g., GPU models, CPU types, memory specifications) were mentioned in the paper. It only refers to network architectures like '5-layer convolutional neural network (CNN)' and 'Res Net-18'.
Software Dependencies No No specific version numbers for software dependencies were mentioned. The paper mentions 'Py Torch' but not its version.
Experiment Setup Yes For all datasets and methods we train using SGD with momentum set to 0.9, use a batch size of 256, and report results over 5 random seeds. In particular, we select the best-performing learning rate from {0.0001, 0.001, 0.01, 0.1, 1}, and for BUFR, we train for 30 epochs per block and decay the learning rate as a function of the number of unfrozen blocks in order to further maintain structure. For all other methods, including FR, we train for 150 epochs with a constant learning rate. The temperature parameter τ (see Appendix A, Eq. 4) is set to 0.01 in all experiments.