reproducibilityindex.ai

RLSbench: Domain Adaptation Under Relaxed Label Shift

Authors: Saurabh Garg, Nick Erickson, James Sharpnack, Alex Smola, Sivaraman Balakrishnan, Zachary Chase Lipton

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we introduce RLSBENCH, a large-scale benchmark for relaxed label shift, consisting of ą500 distribution shift pairs spanning vision, tabular, and language modalities, with varying label proportions. ...First, we assess 13 popular domain adaptation methods, demonstrating more widespread failures under label proportion shifts than were previously known. Next, we develop an effective two-step meta-algorithm... Based on our experiments on RLSBENCH, we make several findings.
Researcher Affiliation	Collaboration	1Carnegie Mellon University 2Amazon Web Services.
Pseudocode	Yes	Algorithm 1 Meta algorithm to handle label marginal shift
Open Source Code	Yes	Code is publicly available at https://github.com/acmi-lab/ RLSbench.
Open Datasets	Yes	RLSBENCH builds on 14 multi-domain datasets for classification... (i) CIFAR-10 which includes the original CIFAR10 (Krizhevsky & Hinton, 2009)... (iv) Office Home (Venkateswara et al., 2017)... (vii) FMo W (Koh et al., 2021; Christie et al., 2018) from WILDS benchmark... (viii) Camelyon (Bandi et al., 2018) from WILDS benchmark...
Dataset Splits	Yes	We partition each source and target dataset into 80% and 20% i.i.d. splits. We use 80% splits for training and 20% splits for evaluation (or validation).
Hardware Specification	Yes	Our experiments were performed across a combination of Nvidia T4, A6000, P100 and V100 GPUs.
Software Dependencies	No	The paper mentions 'Py Torch (Paszke et al., 2017)' but does not provide a specific version number. Other software like 'transformers' and 'scikit-learn' are mentioned without version details.
Experiment Setup	Yes	We summarize learning rate, batch size, number of epochs, and ℓ2 regularization parameter used in our study in Table 6. For each algorithm, we use the hyperparameters reported in the initial papers. For domain-adversarial methods (DANN and CDANN), we use a learning rate multiplier of 0.1 for the featurizer when initializing with a pre-trained network and 1.0 otherwise. We default to a penalty weight of 1.0 for all datasets with pre-trained initialization. Fix Match We use the lambda is 1.0 and use threshold τ as 0.9.