Stability Evaluation through Distributional Perturbation Analysis
Authors: Jose Blanchet, Peng Cui, Jiajin Li, Jiashuo Liu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we validate the practical utility of our stability evaluation criterion across a host of real-world applications. These empirical studies showcase the criterion s ability not only to compare the stability of different learning models and features but also to provide valuable guidelines and strategies to further improve models. |
| Researcher Affiliation | Academia | 1Department of Management Science and Engineering, Stanford University 2Department of Computer Science and Technology, Tsinghua University. |
| Pseudocode | Yes | Algorithm 1 Stability evaluation with general nonlinear loss functions (ϕ(t) = t log t t + 1). Algorithm 2 Stability evaluation with general nonlinear loss functions (ϕ(t) = (t 1)2) |
| Open Source Code | No | The paper does not contain an explicit statement about the release of open-source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | ACS Income and ACS Public Coverage are based on the American Community Survey (ACS) Public Use Microdata Sample (PUMS) (Ding et al., 2021). COVID-19 dataset (Baqui et al., 2020) is based on SIVEP-Gripe data. |
| Dataset Splits | Yes | In our experiments, we sample 2,000 data points from CA for model training, and another 2,000 for evaluation. ... select the best γ according to the validation accuracy. |
| Hardware Specification | Yes | All experiments are performed using a single NVIDIA Ge Force RTX 3090. |
| Software Dependencies | No | The paper mentions 'Py Torch Library (Paszke et al., 2019)' but does not specify its version or other software dependencies with version numbers. |
| Experiment Setup | Yes | The number of hidden units of MLP is set to 16. As for the models under evaluation in Section 4, (1) for AT (Sinha et al., 2018), we vary the penalty parameter γ {0.1, 0.2, . . . , 1.0} and select the best γ according to the validation accuracy, and the inner optimization step is set to 20; (2) for Tilted ERM (Li et al., 2023), we vary the temperature parameter t {0.1, 0.2, . . . , 1.0} and select the best t according to the validation accuracy. Throughout all experiments, the ADAM optimizer with a learning rate of 1e 3 is used. |