reproducibilityindex.ai

Responsible AI (RAI) Games and Ensembles

Authors: Yash Gupta, Runtian Zhai, Arun Suggala, Pradeep Ravikumar

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate the applicability and competitive performance of our techniques for solving several RAI problems, particularly around subpopulation shift.
Researcher Affiliation	Collaboration	Yash Gupta Carnegie Mellon University yashgup2@cs.cmu.edu Runtian Zhai Carnegie Mellon University rzhai@cs.cmu.edu Arun Suggala Google Research arunss@google.com Pradeep Ravikumar Carnegie Mellon University pradeepr@cs.cmu.edu
Pseudocode	Yes	Algorithm 1 Game play algorithm for solving Equation (1) Algorithm 2 Greedy algorithms for solving Equation (1)
Open Source Code	Yes	The relevant code for this work can be found at https://github.com/yashgupta-7/rai-games
Open Datasets	Yes	We use the following datasets: COMPAS [Angwin et al., 2016], CIFAR-10 (original, and with a class imbalanced split [Jin et al., 2021, Qi et al., 2021]) and CIFAR100.
Dataset Splits	Yes	We track the unregularized objective value from Equation 1 for the validation set, and whenever it increases we double the regularization factor η, which we find can improve generalization. ... For these datasets, we use the standard training and testing splits, reserving 10% of the training samples as validation data.
Hardware Specification	No	No specific hardware details such as GPU models, CPU types, or memory configurations were explicitly mentioned for running the experiments. The paper only generally refers to "training & inference compute required" without specification.
Software Dependencies	No	The paper mentions using "SGD with momentum = 0.9 for optimization" but does not provide specific version numbers for any software, libraries, or frameworks used in the implementation.
Experiment Setup	Yes	We use SGD with momentum = 0.9 for optimization. We first warm up the model with some predefined epochs of ERM (3 for COMPAS and 20 for CIFAR-10/100), followed by a maximum of T = 5 base models trained from the warm-up model with sample weights provided by our algorithms. Each base model is trained for 500 iterations on COMPAS and 2000 iterations on CIFAR-10/100. The mini-batch size is set to 128. ...Our implemented versions incorporate a few alterations: 1. We track the unregularized objective value from Equation 1 for the validation set. If it increases at any round t, we increase the regularization factor η by a fixed multiple (specifically, 2). 2. The same un-regularized objective w.r.t normalized Qt is also used to perform a line search for the step size α.