Measure the Predictive Heterogeneity

Authors: Jiashuo Liu, Jiayun Wu, Renjie Pi, Renzhe Xu, Xingxuan Zhang, Bo Li, Peng Cui

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we find the explored heterogeneity is explainable and it provides insights for sub-population divisions in many fields, including agriculture, sociology, and object recognition. And the explored sub-populations could be leveraged to enhance the out-of-distribution generalization performances of machine learning models, which is verified with both simulated and real-world data.
Researcher Affiliation Academia Tsinghua University, Hong Kong University of Science and Technology
Pseudocode Yes 4 ALGORITHM To empirically estimate the predictive heterogeneity in Definition 6, we derive the Information Maximization (IM) algorithm from the formal definition in Equation 33 to infer the distribution of E that maximizes the empirical predictive heterogeneity ˆHEK V (X Y ; D). Objective Function. ... Optimization.
Open Source Code No The paper does not contain any explicit statements about releasing source code or provide links to a code repository.
Open Datasets Yes We use the UCI Adult dataset (Kohavi & Becker, 1996), which is derived from the 1994 Current Population Survey conducted by the US Census Bureau and is widely used in the study of algorithmic fairness.
Dataset Splits No The paper mentions 'In training, we generate 10000 points, where the major group contains 80% data... and the minor group contains 20% data.' and 'In testing, we test the performances of the two groups respectively...', but does not specify a separate validation split or explicit validation set.
Hardware Specification No The paper does not specify any particular hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions types of models used (e.g., MLP, linear models, Res Net18), but does not provide specific software names with version numbers for reproducibility.
Experiment Setup No The paper mentions setting the number of environments K (e.g., 'set K = 2 for our IM algorithm') and the types of models used, but does not provide specific hyperparameters such as learning rates, batch sizes, or optimizer settings for the experimental setup.