Distributionally Robust Optimization with Probabilistic Group

Authors: Soumya Suvra Ghosal, Yixuan Li

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We comprehensively evaluate PG-DRO on both image classification and natural language processing benchmarks, establishing superior performance. In this section, we comprehensively evaluate PG-DRO on both computer vision tasks (Section 4.1) and natural language processing (Section 4.2) containing spurious correlations.
Researcher Affiliation Academia Soumya Suvra Ghosal, Yixuan Li Department of Computer Sciences, University of Wisconsin Madison {sghosal, sharonli}@cs.wisc.edu
Pseudocode No The paper does not contain any explicit 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Training code is available at https://github.com/deeplearning-wisc/PG-DRO.
Open Datasets Yes In this study, we consider two common image classification benchmarks: Waterbirds (Sagawa et al. 2020a) and Celeb A (Liu et al. 2015). ... For this dataset, we use a similar setup as defined in (Nam et al. 2022). Each instance in this dataset corresponds to an online comment generated by users which is labeled as either toxic or not toxic, Y = {TOXIC, NON-TOXIC}. Multi NLI (Williams, Nangia, and Bowman 2018): The Multi-Genre Natural Language Inference (Multi NLI) dataset is a crowdsourced collection of sentence pairs with the premise and hypothesis.
Dataset Splits Yes Models are selected by maximizing the worst-group accuracy on the validation set. On Celeb A (Liu et al. 2015), using only 5% of validation data (988 samples), PG-DRO achieves worst-group accuracy of 89.4% and outperforms G-DRO (88.7%) requiring group-annotation on entire train and validation set (182637 samples).
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No The paper mentions using 'Hugging Face PyTorch Transformers' implementation of BERT, but does not provide specific version numbers for these software components.
Experiment Setup Yes We provide detailed description regarding hyper-parameter in Appendix B and C. Refer Appendix B and C for detailed description regarding hyper-parameters.