Measure the Predictive Heterogeneity
Authors: Jiashuo Liu, Jiayun Wu, Renjie Pi, Renzhe Xu, Xingxuan Zhang, Bo Li, Peng Cui
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we find the explored heterogeneity is explainable and it provides insights for sub-population divisions in many fields, including agriculture, sociology, and object recognition. And the explored sub-populations could be leveraged to enhance the out-of-distribution generalization performances of machine learning models, which is verified with both simulated and real-world data. |
| Researcher Affiliation | Academia | Tsinghua University, Hong Kong University of Science and Technology |
| Pseudocode | Yes | 4 ALGORITHM To empirically estimate the predictive heterogeneity in Definition 6, we derive the Information Maximization (IM) algorithm from the formal definition in Equation 33 to infer the distribution of E that maximizes the empirical predictive heterogeneity ˆHEK V (X Y ; D). Objective Function. ... Optimization. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or provide links to a code repository. |
| Open Datasets | Yes | We use the UCI Adult dataset (Kohavi & Becker, 1996), which is derived from the 1994 Current Population Survey conducted by the US Census Bureau and is widely used in the study of algorithmic fairness. |
| Dataset Splits | No | The paper mentions 'In training, we generate 10000 points, where the major group contains 80% data... and the minor group contains 20% data.' and 'In testing, we test the performances of the two groups respectively...', but does not specify a separate validation split or explicit validation set. |
| Hardware Specification | No | The paper does not specify any particular hardware used for running the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions types of models used (e.g., MLP, linear models, Res Net18), but does not provide specific software names with version numbers for reproducibility. |
| Experiment Setup | No | The paper mentions setting the number of environments K (e.g., 'set K = 2 for our IM algorithm') and the types of models used, but does not provide specific hyperparameters such as learning rates, batch sizes, or optimizer settings for the experimental setup. |