FedCSL: A Scalable and Accurate Approach to Federated Causal Structure Learning

Authors: Xianjie Guo, Kui Yu, Lin Liu, Jiuyong Li

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on benchmark datasets, highdimensional synthetic datasets and a real-world dataset verify the efficacy of the proposed Fed CSL method.
Researcher Affiliation Academia 1School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China 2Uni SA STEM, University of South Australia, Adelaide, Australia
Pseudocode No The paper describes the steps of the Fed CSL method in prose and through mathematical equations but does not include a distinct, labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes The source code is available at https://github.com/Xianjie-Guo/Fed CSL.
Open Datasets Yes Benchmark BN datasets. We use five benchmark BN datasets: Child with 20 variables, Insurance with 27 variables, Alarm with 37 variables, Pigs with 441 variables and Gene with 801 variables, and each dataset contains 5,000 samples (Tsamardinos, Brown, and Aliferis 2006). Real-world datasets. We also compare the proposed method with the baselines on the Sachs (Sachs et al. 2005) dataset. Sachs is a benchmark graphical model representing protein signaling networks in human cells. It consists of 11 nodes (cell types) and 17 edges. Our experiments use 7,466 commonly used observational samples.
Dataset Splits No The paper mentions total sample sizes for datasets but does not explicitly provide details about training, validation, and test dataset splits, such as percentages, absolute counts, or specific methods for creating these splits. The experiments focus on learning causal structures from available data, rather than a traditional train/validation/test split for model evaluation.
Hardware Specification No The paper reports running times in Tables 1 and 2 but does not provide any specific details about the hardware used for the experiments, such as CPU or GPU models, memory, or specific computing environments.
Software Dependencies No The paper mentions various algorithms and software components used (e.g., HITON-PC, G2 test, BDeu, hill-climbing, an open-source software package for data generation) but does not provide specific version numbers for any of these or for programming languages/libraries like Python, PyTorch, etc.
Experiment Setup Yes In our experiments, the local datasets at different clients have different sizes. Let n = Pm k=1 nck be the sum of sample sizes owned by the m clients, the sample size of each local dataset is set as follows. 2m , nck nck 1 = 2(n mnc1) nck = nc1 + 2(n mnc1) m(m 1) (k 1), k 2, 3, ..., m. (11) We use the G2 test (Spirtes et al. 2000), which is an alternative to the χ2 test, to conduct conditional independence (CI) tests between variables. Assume that ρ is the p-value returned by the G2 test and α is a given significance level. We first construct a causal DAG with 5,000 variables, where the maximum number of parents for each variable is 3, and the average degree of each variable is 2.