DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training
Authors: Nathan Kallus
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We next proceed to evaluate Deep Match empirically in three examples, one fully synthetic, one with synthetic outcomes and treatment, and one with real covariate and outcome data. |
| Researcher Affiliation | Academia | 1Cornell Tech, Cornell University, New York, NY, USA. Correspondence to: Nathan Kallus <kallus@cornell.edu>. |
| Pseudocode | Yes | Algorithm 1 Conditional Gradient for Eq. (4.4) and Algorithm 2 Deep Match |
| Open Source Code | No | The paper does not provide any specific links to source code or explicit statements about code availability. |
| Open Datasets | Yes | We next consider an example with confounding image data using the MNIST dataset (Le Cun, 1998) and We use the Twins dataset of 71345 twin births in the US between 1989 1991 as used by (Louizos et al., 2017). |
| Dataset Splits | No | The paper mentions using mini-batches for training and specifies the total number of units (n=1000 for some experiments) and replications, but it does not provide explicit training/validation/test dataset splits (e.g., percentages or counts) or reference standard predefined splits with citations for each dataset used. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models, memory, or cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper mentions using Adam for optimization but does not provide specific software dependencies or library versions (e.g., 'Python 3.x', 'PyTorch 1.x') that would allow for reproducible setup. |
| Experiment Setup | Yes | We use K1 = 20 epochs with mini-batches of 100 to train all networks and for the first stage of Deep Match and we use K2 = 10 epochs for the second stage. We use M = 5 and a grid of 50 φ values based on η = 0.01. For CATT, we let C be linear functions on a univariate XH and ask how well we estimate its coefficient. All NN-based methods use the same architectures, with possibly a different final activation, as detailed in each of the following sections. In our experiments, we use Adam (Kingma & Ba, 2014) with a global learning rate of 10^-4. |