DORO: Distributional and Outlier Robust Optimization

Authors: Runtian Zhai, Chen Dan, Zico Kolter, Pradeep Ravikumar

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct large-scale experiments on modern datasets. Our results show that DORO improves the performance and stability of DRO.
Researcher Affiliation Academia 1School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA. Correspondence to: Runtian Zhai <rzhai@cmu.edu>.
Pseudocode Yes Algorithm 1 DORO with Dβ Divergence Input: Batch size n, outlier fraction ϵ, minimal group size α for each iteration do Sample a batch z1, , zn Ptrain Compute losses: ℓi = ℓ(θ, zi) for i = 1, , n Sort the losses: ℓi1 ℓin Find η = arg minη F(θ, η) where F(θ, η) = cβ(ρ) [ 1 n ϵn Pn j= ϵn +1(ℓ(θ; zij) η)β + ]1/β + η Update θ by one step to minimize ℓ(θ) = F(θ, η ) with some gradient method end for
Open Source Code Yes Codes are available at https://github.com/Runtian Z/doro.
Open Datasets Yes We conduct large-scale experiments on three datasets: the tabular dataset COMPAS, the vision dataset Celeb A, and the language dataset Civil Comments-Wilds. ... We summarize the datasets we use as follows: (i) COMPAS (Larson et al., 2016): recidivism prediction... (ii) Celeb A (Liu et al., 2015): human face recognition... (iii) Civil Comments-Wilds (Borkan et al., 2019; Koh et al., 2020): toxicity identification...
Dataset Splits Yes For COMPAS, we randomly sample 70% of the instances to be the training data (with a fixed random seed) and the rest is the validation/testing data. Both Celeb A and Civil Comments-Wilds have official train-validation-test splits, so we use them directly.
Hardware Specification No No specific hardware details (e.g., GPU models, CPU types, memory) are mentioned in the paper related to the experiments conducted.
Software Dependencies No No specific software versions (e.g., Python, PyTorch, TensorFlow versions) are mentioned in the paper, only high-level model architectures.
Experiment Setup Yes Each algorithm is run 300 epochs on COMPAS, 30 epochs on Celeb A and 5 epochs on Civil Comments-Wilds. ... For every DRO and DORO method, we do a grid search to pick the best α and ϵ that achieve the best worst-case accuracy