RLSbench: Domain Adaptation Under Relaxed Label Shift

Authors: Saurabh Garg, Nick Erickson, James Sharpnack, Alex Smola, Sivaraman Balakrishnan, Zachary Chase Lipton

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we introduce RLSBENCH, a large-scale benchmark for relaxed label shift, consisting of ą500 distribution shift pairs spanning vision, tabular, and language modalities, with varying label proportions. ...First, we assess 13 popular domain adaptation methods, demonstrating more widespread failures under label proportion shifts than were previously known. Next, we develop an effective two-step meta-algorithm... Based on our experiments on RLSBENCH, we make several findings.
Researcher Affiliation Collaboration 1Carnegie Mellon University 2Amazon Web Services.
Pseudocode Yes Algorithm 1 Meta algorithm to handle label marginal shift
Open Source Code Yes Code is publicly available at https://github.com/acmi-lab/ RLSbench.
Open Datasets Yes RLSBENCH builds on 14 multi-domain datasets for classification... (i) CIFAR-10 which includes the original CIFAR10 (Krizhevsky & Hinton, 2009)... (iv) Office Home (Venkateswara et al., 2017)... (vii) FMo W (Koh et al., 2021; Christie et al., 2018) from WILDS benchmark... (viii) Camelyon (Bandi et al., 2018) from WILDS benchmark...
Dataset Splits Yes We partition each source and target dataset into 80% and 20% i.i.d. splits. We use 80% splits for training and 20% splits for evaluation (or validation).
Hardware Specification Yes Our experiments were performed across a combination of Nvidia T4, A6000, P100 and V100 GPUs.
Software Dependencies No The paper mentions 'Py Torch (Paszke et al., 2017)' but does not provide a specific version number. Other software like 'transformers' and 'scikit-learn' are mentioned without version details.
Experiment Setup Yes We summarize learning rate, batch size, number of epochs, and ℓ2 regularization parameter used in our study in Table 6. For each algorithm, we use the hyperparameters reported in the initial papers. For domain-adversarial methods (DANN and CDANN), we use a learning rate multiplier of 0.1 for the featurizer when initializing with a pre-trained network and 1.0 otherwise. We default to a penalty weight of 1.0 for all datasets with pre-trained initialization. Fix Match We use the lambda is 1.0 and use threshold τ as 0.9.