reproducibilityindex.ai

On-Demand Sampling: Learning Optimally from Multiple Distributions

Authors: Nika Haghtalab, Michael Jordan, Eric Zhao

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This section describes experiments where we adapt our on-demand sampling-based multi-distribution learning algorithm for deep learning applications. In particular, we compare our algorithm against the de-facto standard multi-distribution learning algorithm for deep learning, Group DRO (GDRO) [36]. As GDRO is designed for use with offline-collected datasets, to provide an accurate comparison, we modify our algorithm to work on offline datasets (i.e., with no on-demand sample access). Table 2: Worst-group accuracy (our primary performance metric) and the gap between worst-group accuracy and average accuracy, of empirical risk minimization (ERM), Group DRO (GDRO), and our R-MDL algorithm in three experiment settings standard hyperparameters (Standard Reg.), inflated weight decay regularization (Strong Reg.), and early stopping (Early Stop) and on three datasets Waterbirds, Celeb A, and Multi NLI. Figures are percentages evaluated on the test split of each dataset, with standard deviation in parentheses.
Researcher Affiliation	Academia	Nika Haghtalab, Michael I. Jordan, and Eric Zhao University of California, Berkeley
Pseudocode	Yes	Algorithm 1 Finding Equilibria in Finite Zero-Sum Games with Asymmetric Costs. Algorithm 2 On-Demand Agnostic Collaborative Learning.
Open Source Code	No	The provided text does not contain a specific URL or explicit statement about the source code being publicly available or provided in supplementary material. While the paper's self-assessment checklist (outside the main body) indicates code is provided, this is not present in the analysis text.
Open Datasets	Yes	We finetune Resnet-50 models (convolutional neural networks) [18] and BERT models (transformer-based network) [11] on the image classification datasets Waterbirds [36, 41] and Celeb A [23] and the natural language dataset Multi NLI [42] respectively.
Dataset Splits	No	The paper mentions training and evaluating on a test split, but it does not explicitly specify the details of a validation split (e.g., percentages, sample counts, or methodology for creation).
Hardware Specification	No	The paper does not explicitly mention any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions models like Resnet-50 and BERT, and libraries like PyTorch and Torchvision (via citations), but it does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	We train these models in 3 settings: with standard hyperpameters, under strong weight decay (ℓ-2) regularization, or under early stopping.