Estimating Heterogeneous Treatment Effects by Combining Weak Instruments and Observational Data
Authors: Miruna Oprescu, Nathan Kallus
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply our method to both simulated and real-world data. First, we use the confounded synthetic data example from [29], along with a similar data generating process (DGP) to simulate an IV study, maintaining the same confounding structure and treatment effects. Using this DGP, we evaluate Algorithm 1 and Algorithm 2 in estimating the unbiased CATE by integrating these datasets. Next, we demonstrate its utility with real data by analyzing the heterogeneous effects of 401(k) plan participation on wealth. |
| Researcher Affiliation | Academia | Miruna Oprescu Cornell University amo78@cornell.edu Nathan Kallus Cornell University kallus@cornell.edu |
| Pseudocode | Yes | Algorithm 1 CATE Estimation with Parametric Extrapolation and Algorithm 2 CATE Estimation with Representation Learning |
| Open Source Code | Yes | The replication code is available at https: //github.com/Causal ML/Weak-Instruments-Obs-Data-CATE. |
| Open Datasets | Yes | We demonstrate our method s effectiveness with a real-world case study on the impact of 401(k) participation on financial wealth, using data from the 1991 Survey of Income and Program Participation [11]. The real-word 401(k) dataset is available through the doubleml [7] Python package. |
| Dataset Splits | Yes | For the neural networks, we implemented early stopping using a validation dataset that constituted 20% of the total generated datasets. |
| Hardware Specification | Yes | The results for the parametric extension from Section 5.1 were generated on a consumer laptop equipped with a 13th Gen Intel Core i7 CPU. The execution took approximately 1.5 minutes using 20 concurrent workers. In contrast, the representation learning outcomes were derived using an NVIDIA Tesla T4 GPU on Google Colab [19]. |
| Software Dependencies | No | The Random Forest (RF) models used in Algorithm 1 employ the Random Forest Regressor and Random Forest Classifier algorithms from the scikit-learn [42] Python library. For the feedforward neural networks within the representation learning component, we utilize the nn module from the Py Torch package [41]. The real-word 401(k) dataset is available through the doubleml [7] Python package. |
| Experiment Setup | Yes | Details regarding the hyperparameters for these models are provided in Table 1. The Random Forest (RF) models used in Algorithm 1 employ the Random Forest Regressor and Random Forest Classifier algorithms from the scikit-learn [42] Python library. For the feedforward neural networks within the representation learning component, we utilize the nn module from the Py Torch package [41]. In Table 1, specific values such as 'max_depth 3', 'min_samples_leaf 50', 'weight_decay 0.02', 'learning rate 0.01', 'batch size 2000', and 'epochs 1000' are provided. |