Achievable distributional robustness when the robust risk is only partially identified

Authors: Julia Kostin, Nicola Gnecco, Fanny Yang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experimental results In this section, we provide empirical evidence of our theoretical conclusions in Sections 3.1 and 3.2. In particular, we compare the prediction performance of multiple existing robustness methods to the (estimated) minimax robustness in identifiable and partially identifiable settings. We observe that both on synthetic and real-world data, in the partially identified setting, empirical risk minimization and invariance-based robustness methods not only have significantly sub-optimal test loss, but also perform more similarly, thereby aligning with our theoretical results in Section 3.2. This stands in contrast to the identifiable setting, where the anchor predictor is optimal up to finite-sample effects. Furthermore, we observe that even though the minimizer of the worst-case robust risk is optimal only for the linear causal setting in Section 2.1, it surprisingly outperforms existing methods in a real-world experiment.
Researcher Affiliation Academia Julia Kostin Department of Computer Science ETH Zurich jkostin@ethz.ch Nicola Gnecco Gatsby Computational Neuroscience Unit University College London nicola.gnecco@gmail.com Fanny Yang Department of Computer Science ETH Zurich fan.yang@inf.ethz.ch
Pseudocode Yes Algorithm 1 Computation of the worst-case robust loss
Open Source Code No While we do not provide the code, the paper provides all necessary information on reproducing the experiment in Appendices D and E.
Open Datasets Yes We consider the K562 dataset from [38] and perform the preprocessing as done in [13]. The resulting dataset consists of n = 162, 751 single-cell observations over d = 622 genes collected from observational and several interventional environments.
Dataset Splits No No explicit train/validation/test split percentages or counts were found. The paper mentions training datasets Dj1, Dj2, Dj3 and test datasets Dπ,s.
Hardware Specification Yes We use a 2020 13-inch Mac Book Pro with a 1.4 GHz Quad-Core Intel Core i5 processor, 8 GB of RAM, and Intel Iris Plus Graphics 645 with 1536 MB of graphics memory.
Software Dependencies No The paper mentions methods like 'Lasso' and 'anchor regression' but does not specify any software libraries with version numbers (e.g., 'scikit-learn 1.0', 'PyTorch 1.9') used for implementation.
Experiment Setup Yes For Worst-case Rob.: γ = 50, Cker = 1.0, and M = Id. For anchor regression and DRIG, we select γ = 50. For ICP, we set the significance level for the invariance tests to α = 0.05.