Cookie Consent Has Disparate Impact on Estimation Accuracy

Authors: Erik Miehling, Rahul Nair, Elizabeth Daly, Karthikeyan Natesan Ramamurthy, Robert Redmond

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate that when consent rates exhibit demographic-dependence, user consent has a disparate impact on the recommender agent s ability to estimate users latent attributes. Empirical results were obtained via a simulator based on Rec Sim (21).
Researcher Affiliation Industry IBM Research erik.miehling@ibm.com {rahul.nair,elizabeth.daly}@ie.ibm.com {knatesa,rredmond}@us.ibm.com
Pseudocode Yes Expressions for the Bayesian updates and expected binomial probabilities can be found in Appendix A, with pseudocode of the recommendation process in Appendix B. Algorithm 1: Recommendation procedure.
Open Source Code Yes Our simulator was built upon Rec Sim (21) (source code at https://github.com/emiehling/cookie-consent/).
Open Datasets No The paper uses a simulator to generate synthetic data, as described in the "Advertisement and user samplers" section: "The advertisement sampler object (Advertisement Sampler) defines the distribution of each ad feature... Similarly, the user sampler object (User State Sampler) defines the distribution of each user feature...". It does not use or provide access to a publicly available dataset.
Dataset Splits No The paper's experiments are based on simulations generating synthetic data, as detailed in the "User model" and "Advertisement and user samplers" sections. Therefore, it does not specify traditional training, validation, and test dataset splits in the context of a fixed dataset.
Hardware Specification Yes Simulations were run in Python 3.8 on an Intel(R) Xeon(R) CPU E5-2667 v2 (3.30GHz).
Software Dependencies Yes Our simulator was built upon Rec Sim (21)... Simulations were run in Python 3.8...
Experiment Setup Yes Base model parameters assumed throughout this section are: number of users n = 1000, number of ads m = 200, ad pool size l = 50, and number of cohorts d = 2. Estimation is carried out via stochastic gradient descent with a learning rate of 0.01, regularization weight of 0.01, and a stopping threshold on the mean-squared error of εthresh = 0.001. Latent factors are assumed to be of dimension k = 50.