Identifiability Guarantees for Causal Disentanglement from Purely Observational Data
Authors: Ryan Welch, Jiaqi Zhang, Caroline Uhler
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide simulation results to support our theoretical guarantees and demonstrate that our algorithm can derive meaningful causal representations from purely observational data. |
| Researcher Affiliation | Academia | Ryan Welch Massachusetts Institute of Technology Broad Institute of MIT and Harvard Jiaqi Zhang LIDS, Massachusetts Institute of Technology Broad Institute of MIT and Harvard Caroline Uhler LIDS, Massachusetts Institute of Technology Broad Institute of MIT and Harvard |
| Pseudocode | Yes | Algorithm 1 Recovering Z up to upstream layers. Algorithm 2 Recovering E up to layers. |
| Open Source Code | Yes | Code is publicly available at: https://github.com/uhlerlab/observational-crl |
| Open Datasets | No | The paper uses synthetic data generated through simulations. While it describes the sampling procedure, it does not provide concrete access information (link, citation, etc.) to the generated datasets themselves for public access or reproduction beyond regenerating them via the provided code. 'We consider the following causal graphs with 4 nodes: (1) a line graph represented as Z1 Z2 Z3 Z4, and (2) a Y-structure represented as Z1 Z2 Z3, Z2 Z4. For each case, we generate 2000 observational samples and compute the corresponding scores using the ground-truth link functions.' |
| Dataset Splits | No | The paper mentions generating '2000 observational samples' and varying sample sizes (10^3, 5*10^3, 10^4, 5*10^4) for experiments, but does not explicitly specify how these samples were split into training, validation, and test sets. It primarily discusses overall 'samples' and 'empirical results' without detailing specific data partitioning strategies for these splits. |
| Hardware Specification | Yes | With perfect score estimation, we solve each QCQP to global optimality efficiently using Gurobi optimization solvers [13] on an Apple M2 CPU with 8 cores. |
| Software Dependencies | No | The paper mentions 'Gurobi optimization solvers [13]', 'second-order Stein estimator [5, 44]', and 'sliced score matching with variance reduction (SSM-VR) estimator [42]'. However, it does not provide specific version numbers for these software dependencies. It also implies the use of Python for implementation but does not specify a version. |
| Experiment Setup | Yes | We consider the following causal graphs with 4 nodes: (1) a line graph represented as Z1 Z2 Z3 Z4, and (2) a Y-structure represented as Z1 Z2 Z3, Z2 Z4. For each case, we generate 2000 observational samples and compute the corresponding scores using the ground-truth link functions. We implemented the second-order Stein estimator introduced in [5] to generate the point-wise estimates of the score s Jacobian matrices. We use RBF kernels with bandwidth value selected as the median of pairwise distances between points in X. We implemented the sliced score matching with variance reduction (SSM-VR) model developed by [42] to generate functional score estimators. We adjust the tolerance of our QCQP solver to account for noisy estimates (see Appendix C). With perfect score estimation, we solve each QCQP to global optimality efficiently using Gurobi optimization solvers [13] on an Apple M2 CPU with 8 cores. We continuously solve the QCQP indicated by Equation (2) with feasibility tolerance set to 0.001, until no feasible solutions remain. We continue by iteratively appending linearly independent column vectors of unit magnitude until H is of the desired dimension. When using data-driven score estimation methods, we are not guaranteed to find any column vector that solves the specific QCQP indicated by Equation (2) perfectly. To combat this challenge, we first prune the top 25% of de-meaned Jacobian estimates, JX(x), by Frobenius norm to remove outliers. We then solve for the minimum value t such that |v JX(x(m))v| t optimizing over all v Rd. Then, we use the same procedure as in the perfect score estimation case with feasibility tolerance set to t + 0.001 to solve for the estimated matrix H. |