Responsible AI (RAI) Games and Ensembles

Authors: Yash Gupta, Runtian Zhai, Arun Suggala, Pradeep Ravikumar

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate the applicability and competitive performance of our techniques for solving several RAI problems, particularly around subpopulation shift.
Researcher Affiliation Collaboration Yash Gupta Carnegie Mellon University yashgup2@cs.cmu.edu Runtian Zhai Carnegie Mellon University rzhai@cs.cmu.edu Arun Suggala Google Research arunss@google.com Pradeep Ravikumar Carnegie Mellon University pradeepr@cs.cmu.edu
Pseudocode Yes Algorithm 1 Game play algorithm for solving Equation (1) Algorithm 2 Greedy algorithms for solving Equation (1)
Open Source Code Yes The relevant code for this work can be found at https://github.com/yashgupta-7/rai-games
Open Datasets Yes We use the following datasets: COMPAS [Angwin et al., 2016], CIFAR-10 (original, and with a class imbalanced split [Jin et al., 2021, Qi et al., 2021]) and CIFAR100.
Dataset Splits Yes We track the unregularized objective value from Equation 1 for the validation set, and whenever it increases we double the regularization factor η, which we find can improve generalization. ... For these datasets, we use the standard training and testing splits, reserving 10% of the training samples as validation data.
Hardware Specification No No specific hardware details such as GPU models, CPU types, or memory configurations were explicitly mentioned for running the experiments. The paper only generally refers to "training & inference compute required" without specification.
Software Dependencies No The paper mentions using "SGD with momentum = 0.9 for optimization" but does not provide specific version numbers for any software, libraries, or frameworks used in the implementation.
Experiment Setup Yes We use SGD with momentum = 0.9 for optimization. We first warm up the model with some predefined epochs of ERM (3 for COMPAS and 20 for CIFAR-10/100), followed by a maximum of T = 5 base models trained from the warm-up model with sample weights provided by our algorithms. Each base model is trained for 500 iterations on COMPAS and 2000 iterations on CIFAR-10/100. The mini-batch size is set to 128. ...Our implemented versions incorporate a few alterations: 1. We track the unregularized objective value from Equation 1 for the validation set. If it increases at any round t, we increase the regularization factor η by a fixed multiple (specifically, 2). 2. The same un-regularized objective w.r.t normalized Qt is also used to perform a line search for the step size α.