The price of unfairness in linear bandits with biased feedback

Authors: Solenne Gaucher, Alexandra Carpentier, Christophe Giraud

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we study the problem of fair sequential decision making with biased linear bandit feedback. At each round, a player selects an action described by a covariate and by a sensitive attribute. The perceived reward is a linear combination of the covariates of the chosen action, but the player only observes a biased evaluation of this reward, depending on the sensitive attribute. To characterize the difficulty of this problem, we design a phased elimination algorithm that corrects the unfair evaluations, and establish upper bounds on its regret. We show that the worst-case regret is smaller than O( 1/3 log(T)1/3T 2/3), where is an explicit geometrical constant characterizing the difficulty of bias estimation. We prove lower bounds on the worst-case regret for some sets of actions showing that this rate is tight up to a possible sub-logarithmic factor. We also derive gap-dependent upper bounds on the regret, and matching lower bounds for some problem instance.
Researcher Affiliation Academia Solenne Gaucher Département de Mathématiques d Orsay, Université Paris-Saclay, Orsay, France solenne.gaucher@math.u-psud.fr Alexandra Carpentier Institut für Mathematik, Universität Potsdam, Potsdam, Germany carpentier@uni-potsdam.de Christophe Giraud Département de Mathématiques d Orsay, Université Paris-Saclay, Orsay, France christophe.giraud@universite-paris-saclay.fr
Pseudocode Yes Routine 1 G-EXP-ELIM (X, n, ) Routine 2 -EXP-ELIM (X, (X (z), b (z))z2{ 1,1}, b , n, ) Algorithm 3 FAIR PHASED ELIMINATION (sketched)
Open Source Code No Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A]
Open Datasets No The paper is theoretical and does not conduct experiments involving datasets, thus no dataset availability information is provided.
Dataset Splits No The paper is theoretical and does not conduct experiments, therefore no dataset splits for validation are mentioned.
Hardware Specification No The paper is theoretical and does not report on experiments that would require specific hardware. No hardware specifications are provided.
Software Dependencies No The paper is theoretical and does not report on experiments that would require specific software dependencies with version numbers. No such details are provided.
Experiment Setup No The paper is theoretical and does not describe an experimental setup with hyperparameters or training settings.