Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Synthetic Design: An Optimization Approach to Experimental Design with Synthetic Controls
Authors: Nick Doudchenko, Khashayar Khosravi, Jean Pouget-Abadie, Sébastien Lahaie, Miles Lubin, Vahab Mirrokni, Jann Spiess, guido imbens
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use simulations based on publicly available data from the US Bureau of Labor Statistics that show improvements in terms of mean squared error and statistical power when compared to simple and commonly used alternatives such as randomized trials. |
| Researcher Affiliation | Collaboration | Nick Doudchenko Google Research New York, NY 10011 EMAIL Khashayar Khosravi Google Research New York, NY 10011 EMAIL Jean Pouget-Abadie Google Research New York, NY 10011 EMAIL Sebastien Lahaie Google Research New York, NY 10011 EMAIL Miles Lubin Google Research New York, NY 10011 EMAIL Vahab Mirrokni Google Research New York, NY 10011 EMAIL Jann Spiess Stanford GSB Stanford, CA 94305 EMAIL Guido Imbens Stanford GSB Stanford, CA 94305 EMAIL |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled "Pseudocode" or "Algorithm," nor does it present structured steps in a code-like format. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the methodology described is publicly available. |
| Open Datasets | Yes | Using publicly available state-level unemployment data from the US Bureau of Labor Statistics, we compare the proposed methodology to a randomized design... The data are available from the BLS website, but the specific dataset we use is taken from https://github.com/synth-inference/synthdid/blob/master/experiments/bdm/data/urate_cps.csv. |
| Dataset Splits | No | The paper describes a temporal split for its simulations (using the first 7 periods for treatment unit selection and the last 3 periods for treatment application and evaluation). While it mentions "training and validation time periods" in the context of choosing a penalty factor in Section 6, it does not specify explicit training, validation, and test dataset splits with percentages or sample counts for the main experiments in Section 5. |
| Hardware Specification | No | The paper states: "In our simulations we were able to solve problems for N = 50 units which is a meaningful threshold corresponding to the number of states, a typical experimental unit in synthetic-control-type studies on a single machine within hours." This does not provide specific hardware details like GPU/CPU models or memory. |
| Software Dependencies | Yes | We use SCIP (Gamrath et al., 2020) when generating the empirical results in Sections 4 and 5. |
| Experiment Setup | Yes | In each simulation we treat K units (equal to 3 in one set of simulations and 7 in another) which are chosen based on the data in the first 7 periods or chosen randomly in cases (iv) and (v) and the treatment is applied in the last 3 of the 10 periods. We either assign each treated unit the additive treatment effect of 0.05 (the homogeneous treatment case) or assume that the treatment effects increase linearly from 0 to 0.1 from the first unit... For example, the approach we take in Sections 4 and 5 computes the sample variances for every unit i across pre-treatment time periods t = 1, . . . , T and then uses the average of those quantities across all units as the penalty factor, λ. |