Task-specific experimental design for treatment effect estimation
Authors: Bethany Connolly, Kim Moore, Tobias Schwedes, Alexander Adam, Gary Willis, Ilya Feige, Christopher Frye
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Across a range of important tasks, realworld datasets, and sample sizes, our method outperforms other benchmarks, e.g. requiring an order-of-magnitude less data to match RCT performance on targeted marketing tasks. |
| Researcher Affiliation | Industry | Faculty, 160 Old Street, London, UK. Correspondence to: Christopher Frye <chris.f@faculty.ai>. |
| Pseudocode | Yes | Our full method, including discretisation (see Sec. 3.3), is detailed in Algorithm 1 and summarised in Fig. 2. |
| Open Source Code | No | The paper provides a tinyurl for processed data (https://tinyurl.com/RetailHero) and references open-source code for benchmark methods (https://github.com/raddanki/SampleConstrained-Treatment-Effect-Estimation), but does not explicitly state that its *own* methodology's source code is publicly available. |
| Open Datasets | Yes | The datasets we use for our experiments are described at length in App. B.1. In brief, we test our method on: STROKE: clinical trial evaluating aspirin s effect on stroke patients; our sub-selection procedure results in a dataset of size 9k (Sandercock et al., 2011). CRITEOVISIT & CRITEOCONVERSION: marketing trial evaluating effectiveness of email campaign on two different outcomes; we sub-select 7M rows of data (Diemert et al., 2018). RETAILHERO: marketing trial in which we engineered features from purchase history data for 200k individuals (see App. B.1 for references). Processed data: https://tinyurl.com/RetailHero |
| Dataset Splits | Yes | Additionally during model training, the sampled data was partitioned 80/20 into training/validation sets for early stopping (with early-stopping-rounds: 50). ... For all datasets except STROKE, we performed 384 trials per experiment, and we bootstrapresampled the test set for each trial. Because of its smaller size, experiments on STROKE each consisted of 1000 trials, and we performed a fresh train-test split for each trial. |
| Hardware Specification | Yes | All experiments were performed in parallel on 96 core, 393 GB machines. |
| Software Dependencies | No | The paper mentions software components like Adam (Kingma & Ba, 2015) and XGBoost, but does not specify their version numbers or other crucial software dependencies required for replication. |
| Experiment Setup | Yes | The VAE architecture we used in our experiments is comprised of a 2-layer fully-connected encoder with 100dimensional hidden layers, a 2-dimensional latent space... We trained the VAE using Adam (Kingma & Ba, 2015) with learning rate 10 4 and early stopping on the validation-set ELBO. ... We discretised the continuous latent representation... slicing each edge... into a number of cells (with a default of 20 unless stated otherwise). ... the core learners of the ITE estimator were XGBoost models initialised with following hyperparameters: n-estimators: 400 objective: binary:logistic eval-metric: rmse max-depth: 1 (T-learner), 2 (S-learner) Additionally during model training, the sampled data was partitioned 80/20 into training/validation sets for early stopping (with early-stopping-rounds: 50). |