reproducibilityindex.ai

DORO: Distributional and Outlier Robust Optimization

Authors: Runtian Zhai, Chen Dan, Zico Kolter, Pradeep Ravikumar

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct large-scale experiments on modern datasets. Our results show that DORO improves the performance and stability of DRO.
Researcher Affiliation	Academia	1School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA. Correspondence to: Runtian Zhai <rzhai@cmu.edu>.
Pseudocode	Yes	Algorithm 1 DORO with Dβ Divergence Input: Batch size n, outlier fraction ϵ, minimal group size α for each iteration do Sample a batch z1, , zn Ptrain Compute losses: ℓi = ℓ(θ, zi) for i = 1, , n Sort the losses: ℓi1 ℓin Find η = arg minη F(θ, η) where F(θ, η) = cβ(ρ) [ 1 n ϵn Pn j= ϵn +1(ℓ(θ; zij) η)β + ]1/β + η Update θ by one step to minimize ℓ(θ) = F(θ, η ) with some gradient method end for
Open Source Code	Yes	Codes are available at https://github.com/Runtian Z/doro.
Open Datasets	Yes	We conduct large-scale experiments on three datasets: the tabular dataset COMPAS, the vision dataset Celeb A, and the language dataset Civil Comments-Wilds. ... We summarize the datasets we use as follows: (i) COMPAS (Larson et al., 2016): recidivism prediction... (ii) Celeb A (Liu et al., 2015): human face recognition... (iii) Civil Comments-Wilds (Borkan et al., 2019; Koh et al., 2020): toxicity identification...
Dataset Splits	Yes	For COMPAS, we randomly sample 70% of the instances to be the training data (with a fixed random seed) and the rest is the validation/testing data. Both Celeb A and Civil Comments-Wilds have official train-validation-test splits, so we use them directly.
Hardware Specification	No	No specific hardware details (e.g., GPU models, CPU types, memory) are mentioned in the paper related to the experiments conducted.
Software Dependencies	No	No specific software versions (e.g., Python, PyTorch, TensorFlow versions) are mentioned in the paper, only high-level model architectures.
Experiment Setup	Yes	Each algorithm is run 300 epochs on COMPAS, 30 epochs on Celeb A and 5 epochs on Civil Comments-Wilds. ... For every DRO and DORO method, we do a grid search to pick the best α and ϵ that achieve the best worst-case accuracy