On the Epistemic Limits of Personalized Prediction

Authors: Lucas Monteiro Paes, Carol Long, Berk Ustun, Flavio Calmon

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here, we show this effect for a personalized classification task on the UCI Adult dataset [19]... We fit a personalized logistic regression model... We then measure the gains to personalization... Table 1: Personalized models may not assign more accurate predictions for every group who provides personal data. Here, we show this effect for a personalized classification task on the UCI Adult dataset [19]... Figure 3: Overview of estimation error for the Bo P of personalized classifiers on a semi-synthetic dataset built from the UCI Adult dataset [19]. We plot the MSE of Bo P estimates as we increase the number of samples per group, along with the minimax upper and lower bounds from Theorem 2 in log-scale. Here, we compute each estimate using 100 Monte Carlo iterations, and show 95% confidence intervals estimated via bootstrap.
Researcher Affiliation Academia Lucas Monteiro Paes* Harvard SEAS lucaspaes@g.harvard.edu Carol Xuan Long* Harvard SEAS carol_long@g.harvard.edu Berk Ustun University of California, San Diego berk@ucsd.edu Flavio P. Calmon Harvard SEAS flavio@seas.harvard.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] Check supplementary material. All codes are provided.
Open Datasets Yes Here, we show this effect for a personalized classification task on the UCI Adult dataset [19]... We are using standard and public dataset
Dataset Splits Yes Assume that we are given a personalized classifier that uses group attributes hp : X G Y and a generic classifier h0 : X Y that does not. We assume that these models are trained on a training dataset that is independent of the auditing dataset D. The empirical risk ˆR(h, g) for each group is defined as... Table 1: ... these effects arise on the training dataset and the auditing dataset... Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] Check supplementary material. All training details are specified in the code.
Hardware Specification No All experiments run on Google Collab. This statement indicates the computing environment but does not provide specific hardware details like GPU/CPU models or memory specifications.
Software Dependencies No The paper states 'All training details are specified in the code' in the supplementary material checklist, but it does not explicitly list software dependencies with version numbers in the main text.
Experiment Setup No The paper states 'All training details are specified in the code' in the supplementary material checklist, but it does not explicitly provide specific hyperparameter values, training configurations, or system-level settings in the main text.