reproducibilityindex.ai

Synthetic Design: An Optimization Approach to Experimental Design with Synthetic Controls

Authors: Nick Doudchenko, Khashayar Khosravi, Jean Pouget-Abadie, Sébastien Lahaie, Miles Lubin, Vahab Mirrokni, Jann Spiess, guido imbens

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We use simulations based on publicly available data from the US Bureau of Labor Statistics that show improvements in terms of mean squared error and statistical power when compared to simple and commonly used alternatives such as randomized trials.
Researcher Affiliation	Collaboration	Nick Doudchenko Google Research New York, NY 10011 nikolayd@google.com Khashayar Khosravi Google Research New York, NY 10011 khosravi@google.com Jean Pouget-Abadie Google Research New York, NY 10011 jeanpa@google.com Sebastien Lahaie Google Research New York, NY 10011 slahaie@google.com Miles Lubin Google Research New York, NY 10011 mlubin@google.com Vahab Mirrokni Google Research New York, NY 10011 mirrokni@google.com Jann Spiess Stanford GSB Stanford, CA 94305 jspiess@stanford.edu Guido Imbens Stanford GSB Stanford, CA 94305 imbens@stanford.edu
Pseudocode	No	The paper does not contain any sections or figures explicitly labeled "Pseudocode" or "Algorithm," nor does it present structured steps in a code-like format.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the methodology described is publicly available.
Open Datasets	Yes	Using publicly available state-level unemployment data from the US Bureau of Labor Statistics, we compare the proposed methodology to a randomized design... The data are available from the BLS website, but the speciﬁc dataset we use is taken from https://github.com/synth-inference/synthdid/blob/master/experiments/bdm/data/urate_cps.csv.
Dataset Splits	No	The paper describes a temporal split for its simulations (using the first 7 periods for treatment unit selection and the last 3 periods for treatment application and evaluation). While it mentions "training and validation time periods" in the context of choosing a penalty factor in Section 6, it does not specify explicit training, validation, and test dataset splits with percentages or sample counts for the main experiments in Section 5.
Hardware Specification	No	The paper states: "In our simulations we were able to solve problems for N = 50 units which is a meaningful threshold corresponding to the number of states, a typical experimental unit in synthetic-control-type studies on a single machine within hours." This does not provide specific hardware details like GPU/CPU models or memory.
Software Dependencies	Yes	We use SCIP (Gamrath et al., 2020) when generating the empirical results in Sections 4 and 5.
Experiment Setup	Yes	In each simulation we treat K units (equal to 3 in one set of simulations and 7 in another) which are chosen based on the data in the ﬁrst 7 periods or chosen randomly in cases (iv) and (v) and the treatment is applied in the last 3 of the 10 periods. We either assign each treated unit the additive treatment effect of 0.05 (the homogeneous treatment case) or assume that the treatment effects increase linearly from 0 to 0.1 from the ﬁrst unit... For example, the approach we take in Sections 4 and 5 computes the sample variances for every unit i across pre-treatment time periods t = 1, . . . , T and then uses the average of those quantities across all units as the penalty factor, λ.