Participatory Personalization in Classification

Authors: Hailey Joren, Chirag Nagpal, Katherine A. Heller, Berk Ustun

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct a comprehensive empirical study of participatory systems in clinical prediction tasks, benchmarking them with common approaches for personalization and imputation. Our results demonstrate that participatory systems can facilitate and inform consent while improving performance and data use across all groups who report personal data.
Researcher Affiliation Collaboration Hailey Joren UC San Diego Chirag Nagpal Google Research Katherine Heller Google Research Berk Ustun UC San Diego
Pseudocode Yes Algorithm 1 Learning Participatory Systems Input: M : {h : X G Y} pool of candidate models Input: Dassign = {(xi, gi, yi)}nassign i=1 assignment dataset Input: Dprune = {(xi, gi, yi)}nprune i=1 pruning dataset 1: T Viable Trees(G, Dassign) |T| = 1 for minimal & flat systems 2: for T T do 3: T Assign Models(T, M, Dassign) assign models 4: T Prune Leaves(T, Dprune) prune models 5: end for Output T, collection of participatory systems
Open Source Code Yes 4. We provide a Python library to build and evaluate participatory systems. and We include code to reproduce these results in an Python library.
Open Datasets Yes Each dataset is de-identified and available to the public. The cardio_eicu, cardio_mimic, lungcancer datasets require access to public repositories listed under the references.
Dataset Splits Yes We split each dataset into a test sample (20% for evaluating out-of-sample performance) and a training sample (80% for training, pruning, assignment, and estimating gains to show users).
Hardware Specification No The paper does not provide specific hardware details used for running experiments.
Software Dependencies No The paper mentions providing a 'Python library' but does not specify any software dependencies with version numbers.
Experiment Setup No The paper describes varying model classes (logistic regression, random forests) and performance metrics (error rate, AUC) for evaluation but does not provide specific hyperparameters like learning rate, batch size, or optimizer settings for these models.