Fairness in Reinforcement Learning

Authors: Shahin Jabbari, Matthew Joseph, Michael Kearns, Jamie Morgenstern, Aaron Roth

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our first result is negative: despite the fact that fairness is consistent with the optimal policy, any learning algorithm satisfying fairness must take time exponential in the number of states to achieve non-trivial approximation to the optimal policy. We then provide a provably fair polynomial time algorithm under an approximate notion of fairness, thus establishing an exponential gap between exact and approximate fairness. Second, we present an approximate-action fair algorithm (Fair-E3) in Section 4 and prove a polynomial upper bound on the time it requires to achieve near-optimality.
Researcher Affiliation Academia 1University of Pennsylvania, Philadelphia, PA, USA. Correspondence to: Shahin Jabbari, Matthew Joseph, Michael Kearns, Jamie Morgenstern, Aaron Roth <jabbari, majos, mkearns, jamiemor, aaroth@cis.upenn.edu>.
Pseudocode No The paper describes the Fair-E3 algorithm informally in text (Section 4.1, 4.2, 4.3), but it does not contain a structured pseudocode or algorithm block.
Open Source Code No The paper does not contain any explicit statement about providing open-source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets No The paper is purely theoretical and does not conduct experiments on datasets, thus no information about public dataset availability is provided.
Dataset Splits No The paper is theoretical and does not involve empirical evaluation with datasets, thus no training/validation/test dataset split information is provided.
Hardware Specification No The paper is theoretical and does not describe any computational experiments, thus no specific hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not describe any implementation details that would require specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe empirical experiments, therefore no specific experimental setup details, such as hyperparameters or training configurations, are provided.