reproducibilityindex.ai

Distributionally Robust Optimization with Probabilistic Group

Authors: Soumya Suvra Ghosal, Yixuan Li

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We comprehensively evaluate PG-DRO on both image classification and natural language processing benchmarks, establishing superior performance. In this section, we comprehensively evaluate PG-DRO on both computer vision tasks (Section 4.1) and natural language processing (Section 4.2) containing spurious correlations.
Researcher Affiliation	Academia	Soumya Suvra Ghosal, Yixuan Li Department of Computer Sciences, University of Wisconsin Madison {sghosal, sharonli}@cs.wisc.edu
Pseudocode	No	The paper does not contain any explicit 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Training code is available at https://github.com/deeplearning-wisc/PG-DRO.
Open Datasets	Yes	In this study, we consider two common image classification benchmarks: Waterbirds (Sagawa et al. 2020a) and Celeb A (Liu et al. 2015). ... For this dataset, we use a similar setup as defined in (Nam et al. 2022). Each instance in this dataset corresponds to an online comment generated by users which is labeled as either toxic or not toxic, Y = {TOXIC, NON-TOXIC}. Multi NLI (Williams, Nangia, and Bowman 2018): The Multi-Genre Natural Language Inference (Multi NLI) dataset is a crowdsourced collection of sentence pairs with the premise and hypothesis.
Dataset Splits	Yes	Models are selected by maximizing the worst-group accuracy on the validation set. On Celeb A (Liu et al. 2015), using only 5% of validation data (988 samples), PG-DRO achieves worst-group accuracy of 89.4% and outperforms G-DRO (88.7%) requiring group-annotation on entire train and validation set (182637 samples).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies	No	The paper mentions using 'Hugging Face PyTorch Transformers' implementation of BERT, but does not provide specific version numbers for these software components.
Experiment Setup	Yes	We provide detailed description regarding hyper-parameter in Appendix B and C. Refer Appendix B and C for detailed description regarding hyper-parameters.