Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Certified Causal Defense with Generalizable Robustness

Authors: Yiran Qiao, Yu Yin, Chen Chen, Jing Ma

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on benchmark datasets validate the superiority of our framework in certified robustness generalization in different data domains. We conduct extensive experiments to evaluate our framework on both synthetic and real-world datasets. The results show that our framework significantly outperforms the prevalent baseline methods. We also included ablation studies and parameter studies in our experiments.
Researcher Affiliation	Academia	1Case Wester Reserve University 2University of Virginia EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the framework and theories narratively and through mathematical equations and figures, but it does not contain any explicit 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	The paper provides a link to an extended version of the paper on arXiv (https://arxiv.org/pdf/2408.15451), but it does not include any explicit statement about code availability, nor does it provide a link to a code repository or indicate that code is provided in supplementary materials.
Open Datasets	Yes	We introduce the three datasets used in the experiments: CMNIST (Arjovsky et al. 2019), Celeb A (Liu et al. 2015) and Domain Net (Peng et al. 2019).
Dataset Splits	No	Detailed information on the domain construction and division of all these three datasets can be found in the Appendix.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, or memory specifications used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) required to replicate the experiments.
Experiment Setup	Yes	During inference, we apply RS with the noise level σ = 0.12. The result of other σ is shown in the Appendix. We set the parameter of the regularization term λ = 10000 for all datasets. We use a three-layer MLP for CMNIST and a four-layer CNN for Celeb A and Domain Net. We use the same settings in (Cohen, Rosenfeld, and Kolter 2019) with n = 100000, n0 = 100, α = 0.001 to apply CERTIFY.