reproducibilityindex.ai

Learning Models from Data with Measurement Error: Tackling Underreporting

Authors: Roy Adams, Yuelong Ji, Xiaobin Wang, Suchi Saria

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate this method on synthetic data and analyze its sensitivity to near violations of the identiﬁability conditions. Finally, we use this method to estimate the effects of maternal smoking and heroin use during pregnancy on childhood obesity, two import problems from public health.
Researcher Affiliation	Collaboration	1Department of Computer Science, Johns Hopkins University 2Center on the Life Origins of Disease, Department of Population, Family, and Reporductive Health, Johns Hopkins University Bloomberg School of Public Health 3Department of Applied Math and Statistics, Johns Hopkins University 4Bayesian Health.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about making source code available or provide a link to a code repository.
Open Datasets	No	The paper mentions using the "Boston Birth Cohort" data and synthetic data. For the Boston Birth Cohort, it states, "To estimate these effects, we use data from the Boston Birth Cohort, a longitudinal dataset tracking health markers from mothers and children," but provides no link, DOI, or formal citation for public access to this specific dataset.
Dataset Splits	No	The paper does not explicitly specify dataset splits (e.g., train/validation/test percentages or counts) for the real-world or synthetic data, nor does it refer to predefined splits with citations for reproducibility.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions using "L-BFGS" for maximization but does not specify any software dependencies (e.g., libraries, frameworks) with version numbers that would be needed for replication.
Experiment Setup	Yes	In all cases, we used logistic regression models for both pφ(a\|x) and pθ(y\|a, x). ... We estimated τ, φ, and θ by maximizing the log conditional likelihood in Equation 1 using L-BFGS. As in our synthetic experiments (Section 5) our target estimand was the risk difference, which we estimated according to Equation 3. ... For any θX = 0, this data generating process satisﬁes the identiﬁability condition of Corollary 1, so the parameters should be identiﬁable using a single errorprone observation of the exposure. We evaluated a version of the full likelihood method based on Theorem 1 where both pφ(A\|X) and pθ(Y \|A, X) were logistic regression models and we maximized the conditional likelihood in Equation 1 using L-BFGS ( adjusted ).