Robust Generalized Method of Moments: A Finite Sample Viewpoint
Authors: Dhruv Rohatgi, Vasilis Syrgkanis
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, we apply our algorithm to robustly solve IV linear regression. We find that it performs well for a wide range of instrument strengths. In the important setting of heterogeneous treatment effects, our algorithm tolerates as much as 10% corruption. Applied to a seminal dataset previously used to estimate the effect of education on wages [6], we provide evidence for the robustness of the inference, and demonstrate that our algorithm can recover the original inference from corruptions of the dataset, significantly better than baseline approaches. |
| Researcher Affiliation | Collaboration | Dhruv Rohatgi MIT Vasilis Syrgkanis Stanford University drohatgi@mit.edu. This work was partially done while the first author was an intern at Microsoft Research New England. vsyrgk@stanford.edu. This work was partially done while the second author was a Principal Researcher at Microsoft Research New England. |
| Pseudocode | Yes | Algorithm 1 FILTER, Algorithm 2 GMM-SEVER, Algorithm 3 AMPLIFIED-GMM-SEVER, Algorithm 4 ITERATED-GMM-SEVER |
| Open Source Code | Yes | 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See supplemental material. 4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets... (c) Did you include any new assets either in the supplemental material or as a URL? [Yes] Code included in supplemental material |
| Open Datasets | Yes | NLSYM dataset. In this experiment, we use the data of [6] from the National Longitudinal Survey of Young Men for estimating the average treatment effect (ATE) of education on wages. [6] David Card. Using geographic variation in college proximity to estimate the return to schooling, 1993. |
| Dataset Splits | No | The paper describes using synthetic and real-world datasets, and how corruptions are introduced for experiments. It mentions running multiple independent trials to compute median errors but does not specify explicit training, validation, or test dataset splits or cross-validation strategies. |
| Hardware Specification | Yes | 3. If you ran experiments... (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Appendix G |
| Software Dependencies | No | The paper mentions 'scikit-learn: Machine learning in Python.' [25] but does not specify a version number for this or any other software dependency, which is necessary for reproducible setup. |
| Experiment Setup | Yes | In this section we corroborate our theory by applying our algorithm ITERATED-GMM-SEVER to several datasets for IV linear regression. See Appendix G for omitted figures and experimental details (e.g. hyperparameter choices and descriptions of the baselines). |