Stability Evaluation through Distributional Perturbation Analysis

Authors: Jose Blanchet, Peng Cui, Jiajin Li, Jiashuo Liu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we validate the practical utility of our stability evaluation criterion across a host of real-world applications. These empirical studies showcase the criterion s ability not only to compare the stability of different learning models and features but also to provide valuable guidelines and strategies to further improve models.
Researcher Affiliation Academia 1Department of Management Science and Engineering, Stanford University 2Department of Computer Science and Technology, Tsinghua University.
Pseudocode Yes Algorithm 1 Stability evaluation with general nonlinear loss functions (ϕ(t) = t log t t + 1). Algorithm 2 Stability evaluation with general nonlinear loss functions (ϕ(t) = (t 1)2)
Open Source Code No The paper does not contain an explicit statement about the release of open-source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes ACS Income and ACS Public Coverage are based on the American Community Survey (ACS) Public Use Microdata Sample (PUMS) (Ding et al., 2021). COVID-19 dataset (Baqui et al., 2020) is based on SIVEP-Gripe data.
Dataset Splits Yes In our experiments, we sample 2,000 data points from CA for model training, and another 2,000 for evaluation. ... select the best γ according to the validation accuracy.
Hardware Specification Yes All experiments are performed using a single NVIDIA Ge Force RTX 3090.
Software Dependencies No The paper mentions 'Py Torch Library (Paszke et al., 2019)' but does not specify its version or other software dependencies with version numbers.
Experiment Setup Yes The number of hidden units of MLP is set to 16. As for the models under evaluation in Section 4, (1) for AT (Sinha et al., 2018), we vary the penalty parameter γ {0.1, 0.2, . . . , 1.0} and select the best γ according to the validation accuracy, and the inner optimization step is set to 20; (2) for Tilted ERM (Li et al., 2023), we vary the temperature parameter t {0.1, 0.2, . . . , 1.0} and select the best t according to the validation accuracy. Throughout all experiments, the ADAM optimizer with a learning rate of 1e 3 is used.