Synthetic Design: An Optimization Approach to Experimental Design with Synthetic Controls
Authors: Nick Doudchenko, Khashayar Khosravi, Jean Pouget-Abadie, Sébastien Lahaie, Miles Lubin, Vahab Mirrokni, Jann Spiess, guido imbens
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use simulations based on publicly available data from the US Bureau of Labor Statistics that show improvements in terms of mean squared error and statistical power when compared to simple and commonly used alternatives such as randomized trials. |
| Researcher Affiliation | Collaboration | Nick Doudchenko Google Research New York, NY 10011 nikolayd@google.com Khashayar Khosravi Google Research New York, NY 10011 khosravi@google.com Jean Pouget-Abadie Google Research New York, NY 10011 jeanpa@google.com Sebastien Lahaie Google Research New York, NY 10011 slahaie@google.com Miles Lubin Google Research New York, NY 10011 mlubin@google.com Vahab Mirrokni Google Research New York, NY 10011 mirrokni@google.com Jann Spiess Stanford GSB Stanford, CA 94305 jspiess@stanford.edu Guido Imbens Stanford GSB Stanford, CA 94305 imbens@stanford.edu |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled "Pseudocode" or "Algorithm," nor does it present structured steps in a code-like format. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the methodology described is publicly available. |
| Open Datasets | Yes | Using publicly available state-level unemployment data from the US Bureau of Labor Statistics, we compare the proposed methodology to a randomized design... The data are available from the BLS website, but the specific dataset we use is taken from https://github.com/synth-inference/synthdid/blob/master/experiments/bdm/data/urate_cps.csv. |
| Dataset Splits | No | The paper describes a temporal split for its simulations (using the first 7 periods for treatment unit selection and the last 3 periods for treatment application and evaluation). While it mentions "training and validation time periods" in the context of choosing a penalty factor in Section 6, it does not specify explicit training, validation, and test dataset splits with percentages or sample counts for the main experiments in Section 5. |
| Hardware Specification | No | The paper states: "In our simulations we were able to solve problems for N = 50 units which is a meaningful threshold corresponding to the number of states, a typical experimental unit in synthetic-control-type studies on a single machine within hours." This does not provide specific hardware details like GPU/CPU models or memory. |
| Software Dependencies | Yes | We use SCIP (Gamrath et al., 2020) when generating the empirical results in Sections 4 and 5. |
| Experiment Setup | Yes | In each simulation we treat K units (equal to 3 in one set of simulations and 7 in another) which are chosen based on the data in the first 7 periods or chosen randomly in cases (iv) and (v) and the treatment is applied in the last 3 of the 10 periods. We either assign each treated unit the additive treatment effect of 0.05 (the homogeneous treatment case) or assume that the treatment effects increase linearly from 0 to 0.1 from the first unit... For example, the approach we take in Sections 4 and 5 computes the sample variances for every unit i across pre-treatment time periods t = 1, . . . , T and then uses the average of those quantities across all units as the penalty factor, λ. |