Learning Models from Data with Measurement Error: Tackling Underreporting
Authors: Roy Adams, Yuelong Ji, Xiaobin Wang, Suchi Saria
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate this method on synthetic data and analyze its sensitivity to near violations of the identifiability conditions. Finally, we use this method to estimate the effects of maternal smoking and heroin use during pregnancy on childhood obesity, two import problems from public health. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, Johns Hopkins University 2Center on the Life Origins of Disease, Department of Population, Family, and Reporductive Health, Johns Hopkins University Bloomberg School of Public Health 3Department of Applied Math and Statistics, Johns Hopkins University 4Bayesian Health. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about making source code available or provide a link to a code repository. |
| Open Datasets | No | The paper mentions using the "Boston Birth Cohort" data and synthetic data. For the Boston Birth Cohort, it states, "To estimate these effects, we use data from the Boston Birth Cohort, a longitudinal dataset tracking health markers from mothers and children," but provides no link, DOI, or formal citation for public access to this specific dataset. |
| Dataset Splits | No | The paper does not explicitly specify dataset splits (e.g., train/validation/test percentages or counts) for the real-world or synthetic data, nor does it refer to predefined splits with citations for reproducibility. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using "L-BFGS" for maximization but does not specify any software dependencies (e.g., libraries, frameworks) with version numbers that would be needed for replication. |
| Experiment Setup | Yes | In all cases, we used logistic regression models for both pφ(a|x) and pθ(y|a, x). ... We estimated τ, φ, and θ by maximizing the log conditional likelihood in Equation 1 using L-BFGS. As in our synthetic experiments (Section 5) our target estimand was the risk difference, which we estimated according to Equation 3. ... For any θX = 0, this data generating process satisfies the identifiability condition of Corollary 1, so the parameters should be identifiable using a single errorprone observation of the exposure. We evaluated a version of the full likelihood method based on Theorem 1 where both pφ(A|X) and pθ(Y |A, X) were logistic regression models and we maximized the conditional likelihood in Equation 1 using L-BFGS ( adjusted ). |