Distributionally Robust Optimization with Probabilistic Group
Authors: Soumya Suvra Ghosal, Yixuan Li
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We comprehensively evaluate PG-DRO on both image classification and natural language processing benchmarks, establishing superior performance. In this section, we comprehensively evaluate PG-DRO on both computer vision tasks (Section 4.1) and natural language processing (Section 4.2) containing spurious correlations. |
| Researcher Affiliation | Academia | Soumya Suvra Ghosal, Yixuan Li Department of Computer Sciences, University of Wisconsin Madison {sghosal, sharonli}@cs.wisc.edu |
| Pseudocode | No | The paper does not contain any explicit 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Training code is available at https://github.com/deeplearning-wisc/PG-DRO. |
| Open Datasets | Yes | In this study, we consider two common image classification benchmarks: Waterbirds (Sagawa et al. 2020a) and Celeb A (Liu et al. 2015). ... For this dataset, we use a similar setup as defined in (Nam et al. 2022). Each instance in this dataset corresponds to an online comment generated by users which is labeled as either toxic or not toxic, Y = {TOXIC, NON-TOXIC}. Multi NLI (Williams, Nangia, and Bowman 2018): The Multi-Genre Natural Language Inference (Multi NLI) dataset is a crowdsourced collection of sentence pairs with the premise and hypothesis. |
| Dataset Splits | Yes | Models are selected by maximizing the worst-group accuracy on the validation set. On Celeb A (Liu et al. 2015), using only 5% of validation data (988 samples), PG-DRO achieves worst-group accuracy of 89.4% and outperforms G-DRO (88.7%) requiring group-annotation on entire train and validation set (182637 samples). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Hugging Face PyTorch Transformers' implementation of BERT, but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We provide detailed description regarding hyper-parameter in Appendix B and C. Refer Appendix B and C for detailed description regarding hyper-parameters. |