Optimal Representations for Covariate Shift
Authors: Yangjun Ruan, Yann Dubois, Chris J. Maddison
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 EXPERIMENTS In our experiments, we aimed to: (i) verify our theoretical results in practice; (ii) investigate our proposed representation learning objectives in practical DG; (iii) take advantage of pretrained SSL models (in particular, CLIP) to achieve powerful models for DG. Unless stated otherwise, we consider a two-stage training setup. First, the representation learner ( the representor ) trains an encoder p Z | X using a specified objective and freezes it. Then, the person performing predictions ( the learner ) trains her predictor h from Z by minimizing the risk on source data. Finally, the representation Z and predictor h are evaluated on target data. |
| Researcher Affiliation | Academia | Yangjun Ruan 12, Yann Dubois 2, Chris J. Maddison12 1University of Toronto & 2Vector Institute {yjruan,yanndubois,cmaddis}@cs.toronto.edu |
| Pseudocode | Yes | Algorithm 1 CAD objective |
| Open Source Code | Yes | Our implementation is released at https://github.com/ryoungj/optdom. |
| Open Datasets | Yes | We used non-MNIST datasets on Domain Bed that were non-synthetic, including VLCS (Fang et al., 2013), PACS (Li et al., 2017), Office Home (Venkateswara et al., 2017), Terra Incognita (Beery et al., 2018), and Domain Net (Peng et al., 2019). and To do so we used LAION-400M (Schuhmann et al., 2021) that is a public dataset that contains 400M web-crawled image-text pairs. |
| Dataset Splits | Yes | Thus, we randomly split the PACS dataset to 80% training and 20% validation splits for each domain. The training splits were used to train both the encoder and the source predictor, and the validation splits were used for encoder and source predictor selection as well as evaluation on target domains. |
| Hardware Specification | No | The paper mentions 'computational budget' and 'sufficient compute' but does not provide specific details on the hardware used for experiments (e.g., GPU models, CPU types, or cloud instance specifications). |
| Software Dependencies | No | The paper mentions software tools and optimizers like 'Adam optimizer (Kingma & Ba, 2014)' and 'SVM classifier', but does not specify version numbers for any software dependencies required to reproduce the experiments. |
| Experiment Setup | Yes | The Res Net-18 encoder was trained to 300 epochs without any regularization, using the Adam optimizer (Kingma & Ba, 2014) with a learning rate of 5e-5, a batch size of 192 (48 for each domain), and a cosine learning rate decay schedule. and Learning rate: discrete set {1e-4, 3e-4, 1e-3, 3e-3} Batch size: discrete set {128, 256, 512} for Domain Net and Office Home, and {64, 128, 256} for other datasets MLP dropout: discrete set {0., 0.1, 0.5} |