Group Robust Classification Without Any Group Information

Authors: Christos Tsirigotis, Joao Monteiro, Pau Rodriguez, David Vazquez, Aaron C. Courville

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical analysis on synthetic and real-world tasks provides evidence that our approach overcomes the identified challenges and consistently enhances robust accuracy, attaining performance which is competitive with or outperforms that of state-of-the-art methods, which, conversely, rely on bias labels for validation.
Researcher Affiliation Collaboration Christos Tsirigotis Universit e de Montr eal, Mila, Service Now Research Joao Monteiro Service Now Research Pau Rodriguez Apple MLR David Vazquez Service Now Research Aaron Courville Universit e de Montr eal, Mila, CIFAR CAI Chair
Pseudocode Yes In Algorithm 1, we provide a high-level description of the ULA methodology for training and validation of group-robust models without any bias annotations.
Open Source Code Yes In addition, a Py Torch [59] implementation is available at the following repository: https://github.com/tsirif/u LA.
Open Datasets Yes The tasks we consider are all specific instances of the setup above (see Fig. 2 and Section 2). This spans group robustness challenges like colored MNIST [57, CMNIST], corrupted CIFAR10 [37, CCIFAR10] and WATERBIRDS [62], fair classification benchmarks like CELEBA [51], and systematic generalization tasks such as the contributed SMPI3D.
Dataset Splits Yes We subsample the deployment dataset uniformly across i.i.d. combinations for 180k and 18k mutually exclusive data points for the training and validation sets respectively, for all C {2, 3, 4, 5}. Finally, the remaining data points are combined and 54k images are sampled uniformly from all combinations to create the unbiased test set.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) were mentioned for the experimental setup.
Software Dependencies No The paper mentions PyTorch and solo-learn library but does not provide specific version numbers for these software components.
Experiment Setup Yes Learning Rate, Batch Size, Weight Decay, and Temperature for SSL pretraining are listed in Table 7. Search space for Learning Rate, Weight Decay, η, τ, Tssl/Tstop for each dataset is described in Table 8. Best hyperparameters are selected by our proposed bias-unsupervised validation score, as shown in Table 9.