reproducibilityindex.ai

Measure the Predictive Heterogeneity

Authors: Jiashuo Liu, Jiayun Wu, Renjie Pi, Renzhe Xu, Xingxuan Zhang, Bo Li, Peng Cui

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we find the explored heterogeneity is explainable and it provides insights for sub-population divisions in many fields, including agriculture, sociology, and object recognition. And the explored sub-populations could be leveraged to enhance the out-of-distribution generalization performances of machine learning models, which is verified with both simulated and real-world data.
Researcher Affiliation	Academia	Tsinghua University, Hong Kong University of Science and Technology
Pseudocode	Yes	4 ALGORITHM To empirically estimate the predictive heterogeneity in Deﬁnition 6, we derive the Information Maximization (IM) algorithm from the formal deﬁnition in Equation 33 to infer the distribution of E that maximizes the empirical predictive heterogeneity ˆHEK V (X Y ; D). Objective Function. ... Optimization.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code or provide links to a code repository.
Open Datasets	Yes	We use the UCI Adult dataset (Kohavi & Becker, 1996), which is derived from the 1994 Current Population Survey conducted by the US Census Bureau and is widely used in the study of algorithmic fairness.
Dataset Splits	No	The paper mentions 'In training, we generate 10000 points, where the major group contains 80% data... and the minor group contains 20% data.' and 'In testing, we test the performances of the two groups respectively...', but does not specify a separate validation split or explicit validation set.
Hardware Specification	No	The paper does not specify any particular hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions types of models used (e.g., MLP, linear models, Res Net18), but does not provide specific software names with version numbers for reproducibility.
Experiment Setup	No	The paper mentions setting the number of environments K (e.g., 'set K = 2 for our IM algorithm') and the types of models used, but does not provide specific hyperparameters such as learning rates, batch sizes, or optimizer settings for the experimental setup.