Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
RCT Rejection Sampling for Causal Estimation Evaluation
Authors: Katherine A. Keith, Sergey Feldman, David Jurgens, Jonathan Bragg, Rohit Bhattacharya
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using synthetic data, we show our algorithm indeed results in low bias when oracle estimators are evaluated on the confounded samples, which is not always the case for a previously proposed algorithm. In addition to this identification result, we highlight several finite data considerations for evaluation designers who plan to use RCT rejection sampling on their own datasets. As a proof of concept, we implement an example evaluation pipeline and walk through these finite data considerations with a novel, real-world RCT which we release publicly consisting of approximately 70k observations and text data as high-dimensional covariates. |
| Researcher Affiliation | Collaboration | Katherine A. Keith EMAIL Williams College Sergey Feldman EMAIL Allen Institute for Artificial Intelligence David Jurgens EMAIL University of Michigan Jonathan Bragg EMAIL Allen Institute for Artificial Intelligence Rohit Bhattacharya EMAIL Williams College |
| Pseudocode | Yes | Algorithm 1 RCT rejection sampling |
| Open Source Code | Yes | We also release our code.1 1Code and data at https://github.com/kakeith/rct_rejection_sampling. |
| Open Datasets | Yes | As a proof of concept, we implement an example evaluation pipeline and walk through these finite data considerations with a novel, real-world RCT which we release publicly consisting of approximately 70k observations and text data as high-dimensional covariates. We release this novel, real-world RCT dataset of approximately 70k observations that has text as covariates ( 4.1.1). We also release our code.1 1Code and data at https://github.com/kakeith/rct_rejection_sampling. |
| Dataset Splits | Yes | We fit our models using cross-fitting (Hansen, 2000; Newey & Robins, 2018) and cross-validation; see Appendix F for more details. Cross-fitting with cross-validation. We fit our models using cross-fitting (Newey & Robins, 2018) which is also called sample-splitting (Hansen, 2000). Here, we divide the data into K folds. For each inference fold j, the other K 1 folds (shorthand j) are used as the training set to fit the base learners e.g., ˆQ j T0 or ˆg j where the superscript here indicates the data the model is fit on. The single hyperparameter for logistic regression is selected via cross-validation, where the training set is again split into folds. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions several software packages like scikit-learn, CatBoost, and Econ ML but does not provide specific version numbers for any of them. For example, it states: "Using scikit-learn Pedregosa et al. (2011)" and "Using Cat Boost (Dorogush et al., 2018) with default parameters". |
| Experiment Setup | Yes | As a proof of concept, we apply baseline causal estimation models to the resulting DOBS datasets after RCT rejection sampling (with many random seeds); as we mention above. We implement13 commonly-used causal estimation methods via two steps: (1) fitting base learners and (2) using causal estimators that combine the base learners via plug-in principles or second-stage regression. Using Cat Boost (Dorogush et al., 2018) with default parameters and without cross-validation. Using scikit-learn (Pedregosa et al., 2011) and an elasticnet penalty, L1 ratio 0.1, balanced class weights, and SAGA solver. We tune the regularization parameter C via cross-validation over the set C 1e 4, 1e 3, 1e 2, 1e 1, 1e0, 1e1. |