RLSbench: Domain Adaptation Under Relaxed Label Shift
Authors: Saurabh Garg, Nick Erickson, James Sharpnack, Alex Smola, Sivaraman Balakrishnan, Zachary Chase Lipton
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we introduce RLSBENCH, a large-scale benchmark for relaxed label shift, consisting of ą500 distribution shift pairs spanning vision, tabular, and language modalities, with varying label proportions. ...First, we assess 13 popular domain adaptation methods, demonstrating more widespread failures under label proportion shifts than were previously known. Next, we develop an effective two-step meta-algorithm... Based on our experiments on RLSBENCH, we make several findings. |
| Researcher Affiliation | Collaboration | 1Carnegie Mellon University 2Amazon Web Services. |
| Pseudocode | Yes | Algorithm 1 Meta algorithm to handle label marginal shift |
| Open Source Code | Yes | Code is publicly available at https://github.com/acmi-lab/ RLSbench. |
| Open Datasets | Yes | RLSBENCH builds on 14 multi-domain datasets for classification... (i) CIFAR-10 which includes the original CIFAR10 (Krizhevsky & Hinton, 2009)... (iv) Office Home (Venkateswara et al., 2017)... (vii) FMo W (Koh et al., 2021; Christie et al., 2018) from WILDS benchmark... (viii) Camelyon (Bandi et al., 2018) from WILDS benchmark... |
| Dataset Splits | Yes | We partition each source and target dataset into 80% and 20% i.i.d. splits. We use 80% splits for training and 20% splits for evaluation (or validation). |
| Hardware Specification | Yes | Our experiments were performed across a combination of Nvidia T4, A6000, P100 and V100 GPUs. |
| Software Dependencies | No | The paper mentions 'Py Torch (Paszke et al., 2017)' but does not provide a specific version number. Other software like 'transformers' and 'scikit-learn' are mentioned without version details. |
| Experiment Setup | Yes | We summarize learning rate, batch size, number of epochs, and ℓ2 regularization parameter used in our study in Table 6. For each algorithm, we use the hyperparameters reported in the initial papers. For domain-adversarial methods (DANN and CDANN), we use a learning rate multiplier of 0.1 for the featurizer when initializing with a pre-trained network and 1.0 otherwise. We default to a penalty weight of 1.0 for all datasets with pre-trained initialization. Fix Match We use the lambda is 1.0 and use threshold τ as 0.9. |