Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Probable Domain Generalization via Quantile Risk Minimization

Authors: Cian Eastwood, Alexander Robey, Shashank Singh, Julius von Kügelgen, Hamed Hassani, George J. Pappas, Bernhard Schölkopf

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we introduce a more holistic quantile-focused evaluation protocol for DG, and demonstrate that EQRM outperforms state-of-the-art baselines on datasets from WILDS and Domain Bed. ... We now evaluate our EQRM algorithm on synthetic datasets ( 6.1), real-world datasets from WILDS ( 6.2), and few-domain datasets from Domain Bed ( 6.3).
Researcher Affiliation Academia 1 Max Planck Institute for Intelligent Systems, Tübingen 2 University of Edinburgh 3 University of Pennsylvania 4 University of Cambridge
Pseudocode Yes To solve QRM in practice, we introduce the Empirical QRM (EQRM) algorithm ( 4). Given a predictor s empirical risks on the training domains, EQRM forms an estimated risk distribution b Tf. ... as detailed in Alg. 1 of Appendix E.1.
Open Source Code Yes Code available at: https://github.com/cianeastwood/qrm
Open Datasets Yes We now evaluate our EQRM algorithm on synthetic datasets ( 6.1), real-world datasets from WILDS ( 6.2), and few-domain datasets from Domain Bed ( 6.3). ... For example, in the i Wild Cam dataset [50]... OGB-Mol PCBA [116, 117]... CMNIST [9]...
Dataset Splits Yes Table 4: Domain Bed results. Model selection: training-domain validation set. ... For model selection, we adopt the standard Domain Bed protocol of selecting the best model on a held-out validation set from the training domains.
Hardware Specification No The paper mentions running experiments on an 'internal cluster' (in the ethics statement) but does not provide specific details on the CPU, GPU models, memory, or cloud instances used.
Software Dependencies No The paper mentions using specific software components like 'Adam optimizer', 'MLP', and 'dropout' in the experimental setup, but it does not specify their version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x).
Experiment Setup Yes For all methods, we use a 2-hidden-layer MLP with 390 hidden units, the Adam optimizer, a learning rate of 0.0001, and dropout with p=0.2. We sweep over five penalty weights for the baselines and five α s for EQRM. ... All experiments were performed with a batch size of 200, a learning rate of 0.0001, a weight decay of 0.0001, and trained for 100 epochs. ... models were trained for 5000 steps with a batch size of 32, a learning rate of 5e-5, and weight decay of 0.