Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Certified Causal Defense with Generalizable Robustness
Authors: Yiran Qiao, Yu Yin, Chen Chen, Jing Ma
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on benchmark datasets validate the superiority of our framework in certified robustness generalization in different data domains. We conduct extensive experiments to evaluate our framework on both synthetic and real-world datasets. The results show that our framework significantly outperforms the prevalent baseline methods. We also included ablation studies and parameter studies in our experiments. |
| Researcher Affiliation | Academia | 1Case Wester Reserve University 2University of Virginia EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the framework and theories narratively and through mathematical equations and figures, but it does not contain any explicit 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper provides a link to an extended version of the paper on arXiv (https://arxiv.org/pdf/2408.15451), but it does not include any explicit statement about code availability, nor does it provide a link to a code repository or indicate that code is provided in supplementary materials. |
| Open Datasets | Yes | We introduce the three datasets used in the experiments: CMNIST (Arjovsky et al. 2019), Celeb A (Liu et al. 2015) and Domain Net (Peng et al. 2019). |
| Dataset Splits | No | Detailed information on the domain construction and division of all these three datasets can be found in the Appendix. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) required to replicate the experiments. |
| Experiment Setup | Yes | During inference, we apply RS with the noise level σ = 0.12. The result of other σ is shown in the Appendix. We set the parameter of the regularization term λ = 10000 for all datasets. We use a three-layer MLP for CMNIST and a four-layer CNN for Celeb A and Domain Net. We use the same settings in (Cohen, Rosenfeld, and Kolter 2019) with n = 100000, n0 = 100, α = 0.001 to apply CERTIFY. |