Causal Inference with Noisy and Missing Covariates via Matrix Factorization
Authors: Nathan Kallus, Xiaojie Mao, Madeleine Udell
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of the proposed procedure in numerical experiments with both synthetic data and real clinical data. We evaluate the effectiveness of our proposed procedure on both synthetic datasets and a clinical dataset involving the mortality of twins born in the USA introduced by Louizos et al. [9]. We empirically demonstrate that matrix factorization can accurately estimate causal effects by inferring the latent confounders from a large number of noisy covariates. |
| Researcher Affiliation | Academia | Nathan Kallus Xiaojie Mao Cornell University {kallus, xm77, udell}@cornell.edu Madeleine Udell |
| Pseudocode | No | The paper describes the mathematical formulations and practical implementations (e.g., equation (5)), but it does not contain any structured pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | In our experiments, we use the the R package soft Impute [33] for continuous covariates and quadratic loss, the R package logistic PCA [34] for binary covariates and logistic loss, and the Julia package Low Rank Models [18] for categorical variables and multinomial loss. (The paper utilizes existing open-source packages, but does not provide concrete access to its own source code implementation for the described methodology.) |
| Open Datasets | Yes | We further examine the effectiveness of matrix factorization preprocessing using the twins dataset introduced by Louizos et al. [9]. This dataset includes information for N = 11984 pairs of twins of same sex who were born in the USA between 1998-1991 and weighted less than 2kg. ... More details about the dataset can be found in Louizos et al. [9]. |
| Dataset Splits | Yes | All tuning parameters are chosen via 5-fold cross-validation. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as CPU or GPU models, or cloud computing specifications. |
| Software Dependencies | No | In our experiments, we use the the R package soft Impute [33] for continuous covariates and quadratic loss, the R package logistic PCA [34] for binary covariates and logistic loss, and the Julia package Low Rank Models [18] for categorical variables and multinomial loss. (The paper names software packages used but does not provide specific version numbers for them.) |
| Experiment Setup | Yes | We set the dimension of the latent confounders r = 5, use α = [ 2, 3, 2, 3, 2] and β = [1, 2, 2, 2, 2], and choose τ = 2 in our example. All tuning parameters are chosen via 5-fold cross-validation. we replicate the GESTAT10 p times and independently perturb the entries of these p copies with probability 0.5. Each perturbed entry is assigned with a new value sampled from 0 to 9 uniformly at random. We also consider the presence of missing values: we set each entry as missing value independently with probability 0.3. |