reproducibilityindex.ai

Stability Evaluation through Distributional Perturbation Analysis

Authors: Jose Blanchet, Peng Cui, Jiajin Li, Jiashuo Liu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we validate the practical utility of our stability evaluation criterion across a host of real-world applications. These empirical studies showcase the criterion s ability not only to compare the stability of different learning models and features but also to provide valuable guidelines and strategies to further improve models.
Researcher Affiliation	Academia	1Department of Management Science and Engineering, Stanford University 2Department of Computer Science and Technology, Tsinghua University.
Pseudocode	Yes	Algorithm 1 Stability evaluation with general nonlinear loss functions (ϕ(t) = t log t t + 1). Algorithm 2 Stability evaluation with general nonlinear loss functions (ϕ(t) = (t 1)2)
Open Source Code	No	The paper does not contain an explicit statement about the release of open-source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	ACS Income and ACS Public Coverage are based on the American Community Survey (ACS) Public Use Microdata Sample (PUMS) (Ding et al., 2021). COVID-19 dataset (Baqui et al., 2020) is based on SIVEP-Gripe data.
Dataset Splits	Yes	In our experiments, we sample 2,000 data points from CA for model training, and another 2,000 for evaluation. ... select the best γ according to the validation accuracy.
Hardware Specification	Yes	All experiments are performed using a single NVIDIA Ge Force RTX 3090.
Software Dependencies	No	The paper mentions 'Py Torch Library (Paszke et al., 2019)' but does not specify its version or other software dependencies with version numbers.
Experiment Setup	Yes	The number of hidden units of MLP is set to 16. As for the models under evaluation in Section 4, (1) for AT (Sinha et al., 2018), we vary the penalty parameter γ {0.1, 0.2, . . . , 1.0} and select the best γ according to the validation accuracy, and the inner optimization step is set to 20; (2) for Tilted ERM (Li et al., 2023), we vary the temperature parameter t {0.1, 0.2, . . . , 1.0} and select the best t according to the validation accuracy. Throughout all experiments, the ADAM optimizer with a learning rate of 1e 3 is used.