Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Probable Domain Generalization via Quantile Risk Minimization
Authors: Cian Eastwood, Alexander Robey, Shashank Singh, Julius von Kügelgen, Hamed Hassani, George J. Pappas, Bernhard Schölkopf
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we introduce a more holistic quantile-focused evaluation protocol for DG, and demonstrate that EQRM outperforms state-of-the-art baselines on datasets from WILDS and Domain Bed. ... We now evaluate our EQRM algorithm on synthetic datasets ( 6.1), real-world datasets from WILDS ( 6.2), and few-domain datasets from Domain Bed ( 6.3). |
| Researcher Affiliation | Academia | 1 Max Planck Institute for Intelligent Systems, Tübingen 2 University of Edinburgh 3 University of Pennsylvania 4 University of Cambridge |
| Pseudocode | Yes | To solve QRM in practice, we introduce the Empirical QRM (EQRM) algorithm ( 4). Given a predictor s empirical risks on the training domains, EQRM forms an estimated risk distribution b Tf. ... as detailed in Alg. 1 of Appendix E.1. |
| Open Source Code | Yes | Code available at: https://github.com/cianeastwood/qrm |
| Open Datasets | Yes | We now evaluate our EQRM algorithm on synthetic datasets ( 6.1), real-world datasets from WILDS ( 6.2), and few-domain datasets from Domain Bed ( 6.3). ... For example, in the i Wild Cam dataset [50]... OGB-Mol PCBA [116, 117]... CMNIST [9]... |
| Dataset Splits | Yes | Table 4: Domain Bed results. Model selection: training-domain validation set. ... For model selection, we adopt the standard Domain Bed protocol of selecting the best model on a held-out validation set from the training domains. |
| Hardware Specification | No | The paper mentions running experiments on an 'internal cluster' (in the ethics statement) but does not provide specific details on the CPU, GPU models, memory, or cloud instances used. |
| Software Dependencies | No | The paper mentions using specific software components like 'Adam optimizer', 'MLP', and 'dropout' in the experimental setup, but it does not specify their version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x). |
| Experiment Setup | Yes | For all methods, we use a 2-hidden-layer MLP with 390 hidden units, the Adam optimizer, a learning rate of 0.0001, and dropout with p=0.2. We sweep over five penalty weights for the baselines and five α s for EQRM. ... All experiments were performed with a batch size of 200, a learning rate of 0.0001, a weight decay of 0.0001, and trained for 100 epochs. ... models were trained for 5000 steps with a batch size of 32, a learning rate of 5e-5, and weight decay of 0. |