Estimating Heterogeneous Treatment Effects by Combining Weak Instruments and Observational Data

Authors: Miruna Oprescu, Nathan Kallus

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply our method to both simulated and real-world data. First, we use the confounded synthetic data example from [29], along with a similar data generating process (DGP) to simulate an IV study, maintaining the same confounding structure and treatment effects. Using this DGP, we evaluate Algorithm 1 and Algorithm 2 in estimating the unbiased CATE by integrating these datasets. Next, we demonstrate its utility with real data by analyzing the heterogeneous effects of 401(k) plan participation on wealth.
Researcher Affiliation Academia Miruna Oprescu Cornell University amo78@cornell.edu Nathan Kallus Cornell University kallus@cornell.edu
Pseudocode Yes Algorithm 1 CATE Estimation with Parametric Extrapolation and Algorithm 2 CATE Estimation with Representation Learning
Open Source Code Yes The replication code is available at https: //github.com/Causal ML/Weak-Instruments-Obs-Data-CATE.
Open Datasets Yes We demonstrate our method s effectiveness with a real-world case study on the impact of 401(k) participation on financial wealth, using data from the 1991 Survey of Income and Program Participation [11]. The real-word 401(k) dataset is available through the doubleml [7] Python package.
Dataset Splits Yes For the neural networks, we implemented early stopping using a validation dataset that constituted 20% of the total generated datasets.
Hardware Specification Yes The results for the parametric extension from Section 5.1 were generated on a consumer laptop equipped with a 13th Gen Intel Core i7 CPU. The execution took approximately 1.5 minutes using 20 concurrent workers. In contrast, the representation learning outcomes were derived using an NVIDIA Tesla T4 GPU on Google Colab [19].
Software Dependencies No The Random Forest (RF) models used in Algorithm 1 employ the Random Forest Regressor and Random Forest Classifier algorithms from the scikit-learn [42] Python library. For the feedforward neural networks within the representation learning component, we utilize the nn module from the Py Torch package [41]. The real-word 401(k) dataset is available through the doubleml [7] Python package.
Experiment Setup Yes Details regarding the hyperparameters for these models are provided in Table 1. The Random Forest (RF) models used in Algorithm 1 employ the Random Forest Regressor and Random Forest Classifier algorithms from the scikit-learn [42] Python library. For the feedforward neural networks within the representation learning component, we utilize the nn module from the Py Torch package [41]. In Table 1, specific values such as 'max_depth 3', 'min_samples_leaf 50', 'weight_decay 0.02', 'learning rate 0.01', 'batch size 2000', and 'epochs 1000' are provided.