Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
DORO: Distributional and Outlier Robust Optimization
Authors: Runtian Zhai, Chen Dan, Zico Kolter, Pradeep Ravikumar
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct large-scale experiments on modern datasets. Our results show that DORO improves the performance and stability of DRO. |
| Researcher Affiliation | Academia | 1School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA. Correspondence to: Runtian Zhai <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 DORO with Dβ Divergence Input: Batch size n, outlier fraction ϵ, minimal group size α for each iteration do Sample a batch z1, , zn Ptrain Compute losses: ℓi = ℓ(θ, zi) for i = 1, , n Sort the losses: ℓi1 ℓin Find η = arg minη F(θ, η) where F(θ, η) = cβ(ρ) [ 1 n ϵn Pn j= ϵn +1(ℓ(θ; zij) η)β + ]1/β + η Update θ by one step to minimize ℓ(θ) = F(θ, η ) with some gradient method end for |
| Open Source Code | Yes | Codes are available at https://github.com/Runtian Z/doro. |
| Open Datasets | Yes | We conduct large-scale experiments on three datasets: the tabular dataset COMPAS, the vision dataset Celeb A, and the language dataset Civil Comments-Wilds. ... We summarize the datasets we use as follows: (i) COMPAS (Larson et al., 2016): recidivism prediction... (ii) Celeb A (Liu et al., 2015): human face recognition... (iii) Civil Comments-Wilds (Borkan et al., 2019; Koh et al., 2020): toxicity identification... |
| Dataset Splits | Yes | For COMPAS, we randomly sample 70% of the instances to be the training data (with a fixed random seed) and the rest is the validation/testing data. Both Celeb A and Civil Comments-Wilds have official train-validation-test splits, so we use them directly. |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU types, memory) are mentioned in the paper related to the experiments conducted. |
| Software Dependencies | No | No specific software versions (e.g., Python, PyTorch, TensorFlow versions) are mentioned in the paper, only high-level model architectures. |
| Experiment Setup | Yes | Each algorithm is run 300 epochs on COMPAS, 30 epochs on Celeb A and 5 epochs on Civil Comments-Wilds. ... For every DRO and DORO method, we do a grid search to pick the best α and ϵ that achieve the best worst-case accuracy |