Domain constraints improve risk prediction when outcome data is missing

Authors: Sidhika Balachandar, Nikhil Garg, Emma Pierson

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show theoretically and on synthetic data that domain constraints improve parameter inference. We apply our model to a case study of cancer risk prediction, showing that the model s inferred risk predicts cancer diagnoses, its inferred testing policy captures known public health policies, and it can identify suboptimalities in test allocation. Though our case study is in healthcare, our analysis reveals a general class of domain constraints which can improve model estimation in many settings.
Researcher Affiliation Academia Sidhika Balachandar Cornell Tech Nikhil Garg Cornell Tech Emma Pierson Cornell Tech
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Details are in Appendix D and the code is at https://github.com/sidhikabalachandar/domain_constraints.
Open Datasets Yes Our data comes from the UK Biobank (Sudlow et al., 2015), which contains information on health, demographics, and genetics for the UK (see Appendix E for details).
Dataset Splits No The paper mentions evaluating on a 'test set' in Section 5.2, but does not provide specific percentages, sample counts, or explicit methodology for training, validation, and test splits needed to reproduce the data partitioning.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions using the 'Bayesian inference package Stan' (Carpenter et al., 2017) but does not specify a version number for it or any other software dependencies.
Experiment Setup No The paper describes the model architecture and general experimental approach in Sections 4 and 5.1, but does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or detailed training configurations.