Synthetic Design: An Optimization Approach to Experimental Design with Synthetic Controls

Authors: Nick Doudchenko, Khashayar Khosravi, Jean Pouget-Abadie, Sébastien Lahaie, Miles Lubin, Vahab Mirrokni, Jann Spiess, guido imbens

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We use simulations based on publicly available data from the US Bureau of Labor Statistics that show improvements in terms of mean squared error and statistical power when compared to simple and commonly used alternatives such as randomized trials.
Researcher Affiliation Collaboration Nick Doudchenko Google Research New York, NY 10011 nikolayd@google.com Khashayar Khosravi Google Research New York, NY 10011 khosravi@google.com Jean Pouget-Abadie Google Research New York, NY 10011 jeanpa@google.com Sebastien Lahaie Google Research New York, NY 10011 slahaie@google.com Miles Lubin Google Research New York, NY 10011 mlubin@google.com Vahab Mirrokni Google Research New York, NY 10011 mirrokni@google.com Jann Spiess Stanford GSB Stanford, CA 94305 jspiess@stanford.edu Guido Imbens Stanford GSB Stanford, CA 94305 imbens@stanford.edu
Pseudocode No The paper does not contain any sections or figures explicitly labeled "Pseudocode" or "Algorithm," nor does it present structured steps in a code-like format.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the methodology described is publicly available.
Open Datasets Yes Using publicly available state-level unemployment data from the US Bureau of Labor Statistics, we compare the proposed methodology to a randomized design... The data are available from the BLS website, but the specific dataset we use is taken from https://github.com/synth-inference/synthdid/blob/master/experiments/bdm/data/urate_cps.csv.
Dataset Splits No The paper describes a temporal split for its simulations (using the first 7 periods for treatment unit selection and the last 3 periods for treatment application and evaluation). While it mentions "training and validation time periods" in the context of choosing a penalty factor in Section 6, it does not specify explicit training, validation, and test dataset splits with percentages or sample counts for the main experiments in Section 5.
Hardware Specification No The paper states: "In our simulations we were able to solve problems for N = 50 units which is a meaningful threshold corresponding to the number of states, a typical experimental unit in synthetic-control-type studies on a single machine within hours." This does not provide specific hardware details like GPU/CPU models or memory.
Software Dependencies Yes We use SCIP (Gamrath et al., 2020) when generating the empirical results in Sections 4 and 5.
Experiment Setup Yes In each simulation we treat K units (equal to 3 in one set of simulations and 7 in another) which are chosen based on the data in the first 7 periods or chosen randomly in cases (iv) and (v) and the treatment is applied in the last 3 of the 10 periods. We either assign each treated unit the additive treatment effect of 0.05 (the homogeneous treatment case) or assume that the treatment effects increase linearly from 0 to 0.1 from the first unit... For example, the approach we take in Sections 4 and 5 computes the sample variances for every unit i across pre-treatment time periods t = 1, . . . , T and then uses the average of those quantities across all units as the penalty factor, λ.