On the Epistemic Limits of Personalized Prediction
Authors: Lucas Monteiro Paes, Carol Long, Berk Ustun, Flavio Calmon
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here, we show this effect for a personalized classification task on the UCI Adult dataset [19]... We fit a personalized logistic regression model... We then measure the gains to personalization... Table 1: Personalized models may not assign more accurate predictions for every group who provides personal data. Here, we show this effect for a personalized classification task on the UCI Adult dataset [19]... Figure 3: Overview of estimation error for the Bo P of personalized classifiers on a semi-synthetic dataset built from the UCI Adult dataset [19]. We plot the MSE of Bo P estimates as we increase the number of samples per group, along with the minimax upper and lower bounds from Theorem 2 in log-scale. Here, we compute each estimate using 100 Monte Carlo iterations, and show 95% confidence intervals estimated via bootstrap. |
| Researcher Affiliation | Academia | Lucas Monteiro Paes* Harvard SEAS lucaspaes@g.harvard.edu Carol Xuan Long* Harvard SEAS carol_long@g.harvard.edu Berk Ustun University of California, San Diego berk@ucsd.edu Flavio P. Calmon Harvard SEAS flavio@seas.harvard.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] Check supplementary material. All codes are provided. |
| Open Datasets | Yes | Here, we show this effect for a personalized classification task on the UCI Adult dataset [19]... We are using standard and public dataset |
| Dataset Splits | Yes | Assume that we are given a personalized classifier that uses group attributes hp : X G Y and a generic classifier h0 : X Y that does not. We assume that these models are trained on a training dataset that is independent of the auditing dataset D. The empirical risk ˆR(h, g) for each group is defined as... Table 1: ... these effects arise on the training dataset and the auditing dataset... Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] Check supplementary material. All training details are specified in the code. |
| Hardware Specification | No | All experiments run on Google Collab. This statement indicates the computing environment but does not provide specific hardware details like GPU/CPU models or memory specifications. |
| Software Dependencies | No | The paper states 'All training details are specified in the code' in the supplementary material checklist, but it does not explicitly list software dependencies with version numbers in the main text. |
| Experiment Setup | No | The paper states 'All training details are specified in the code' in the supplementary material checklist, but it does not explicitly provide specific hyperparameter values, training configurations, or system-level settings in the main text. |