Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Robust Generalized Method of Moments: A Finite Sample Viewpoint

Authors: Dhruv Rohatgi, Vasilis Syrgkanis

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally, we apply our algorithm to robustly solve IV linear regression. We ﬁnd that it performs well for a wide range of instrument strengths. In the important setting of heterogeneous treatment effects, our algorithm tolerates as much as 10% corruption. Applied to a seminal dataset previously used to estimate the effect of education on wages [6], we provide evidence for the robustness of the inference, and demonstrate that our algorithm can recover the original inference from corruptions of the dataset, signiﬁcantly better than baseline approaches.
Researcher Affiliation	Collaboration	Dhruv Rohatgi MIT Vasilis Syrgkanis Stanford University EMAIL. This work was partially done while the ﬁrst author was an intern at Microsoft Research New England. EMAIL. This work was partially done while the second author was a Principal Researcher at Microsoft Research New England.
Pseudocode	Yes	Algorithm 1 FILTER, Algorithm 2 GMM-SEVER, Algorithm 3 AMPLIFIED-GMM-SEVER, Algorithm 4 ITERATED-GMM-SEVER
Open Source Code	Yes	3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See supplemental material. 4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets... (c) Did you include any new assets either in the supplemental material or as a URL? [Yes] Code included in supplemental material
Open Datasets	Yes	NLSYM dataset. In this experiment, we use the data of [6] from the National Longitudinal Survey of Young Men for estimating the average treatment effect (ATE) of education on wages. [6] David Card. Using geographic variation in college proximity to estimate the return to schooling, 1993.
Dataset Splits	No	The paper describes using synthetic and real-world datasets, and how corruptions are introduced for experiments. It mentions running multiple independent trials to compute median errors but does not specify explicit training, validation, or test dataset splits or cross-validation strategies.
Hardware Specification	Yes	3. If you ran experiments... (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Appendix G
Software Dependencies	No	The paper mentions 'scikit-learn: Machine learning in Python.' [25] but does not specify a version number for this or any other software dependency, which is necessary for reproducible setup.
Experiment Setup	Yes	In this section we corroborate our theory by applying our algorithm ITERATED-GMM-SEVER to several datasets for IV linear regression. See Appendix G for omitted ﬁgures and experimental details (e.g. hyperparameter choices and descriptions of the baselines).