Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Identifying Causal Mechanism Shifts Under Additive Models with Arbitrary Noise

Authors: Yewei Xia, Xueliang Cui, Hao Zhang, Yixin Ren, Feng Xie, Jihong Guan, Ruxin Wang, Shuigeng Zhou

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluated on various synthetic datasets, CMSI consistently outperforms existing baselines in terms of F1 score. Additionally, we demonstrate CMSI s applicability on gene expression datasets of ovarian cancer patients at different disease stages. We evaluate the performance and applicability of our method by extensive experiments on both synthetic and ovarian cancer datasets.
Researcher Affiliation Academia 1Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai, China 2SIAT, Chinese Academy of Sciences, Shenzhen, China 3Southern University of Science and Technology, Shenzhen, China 4Department of Applied Statistics, Beijing Technology and Business University, Beijing, China 5Department of Computer Science and Technology, Tongji University, Shanghai, China
Pseudocode Yes Algorithm 1 Regression the Score on Residual (Re SR) Input: Dataset X. Output: Estimator ˆg(R) of the score s(X). Algorithm 2 Causal Mechanism Shifts Identification (CMSI) Input: Dataset X1, ..., XH. Output: Estimated shifted variables set ˆI.
Open Source Code No The paper does not provide concrete access to source code for the methodology described. It mentions using NVIDIA's cu ML library and the Python library kneed, but not that the authors' own implementation code is released.
Open Datasets Yes We evaluated CMSI on an ovarian cancer dataset [Tothill et al., 2008] that was previously analyzed by i SCAN [Chen et al., 2024b] and DCI [Wang et al., 2018].
Dataset Splits Yes The default sample size in each environment equals 500. ...For example, consider a scenario with H = 3 environments, each containing d = 20 nodes. ...We evaluated CMSI on an ovarian cancer dataset [Tothill et al., 2008] ...divided into two subsets based on survival duration.
Hardware Specification Yes Experiments were conducted on a system equipped with an Intel Xeon(R) Platinum 8255C CPU and two NVIDIA Ge Force RTX 2080 Ti GPUs.
Software Dependencies No The paper mentions using "NVIDIA s cu ML library" and "the Python library kneed" but does not specify version numbers for these software components.
Experiment Setup Yes The regularization coefficient in the kernel ridge regression is set to α = 0.1, and we use the radial basis function (RBF) kernel with a width parameter of γ = 0.1. ...Based on the generated causal graph and the additive noise model (noise {Gaussian(0, 1), Laplace(0, 1), Gumbel(0, 1), Exponential(1), Beta(1, 1), Gamma(0.5, 0.5)})...