Fair Exploration via Axiomatic Bargaining

Authors: Jackie Baek, Vivek Farias

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our theoretical development is complemented by a case study on contextual bandits for warfarin dosing where we are concerned with the cost of exploration across multiple races and age groups. ... As a final contribution, we extend our framework beyond the grouped K-armed bandit and undertake an empirical study: Linear Contextual Bandits and Warfarin Dosing: We extend our framework to grouped linear contextual bandits, yielding a candidate Nash solution there. Applied to a real-world dataset on warfarin dosing using race and age groups, we show (a) a regret optimal solution that ignores groups is dramatically unfair, and (b) the Nash solution balances out reductions in regret across groups at the cost of a small increase in total regret. ... 6 Experiments We consider two sets of experiments. The first seeks to understand the Po F in synthetic instances to shed further light on the impact of topology. The second is a real-world case study that returns to the Warfarin dosing example discussed in motivating the paper where we seek to understand unfairness under a regret optimal policy and the extent to which the Nash solution can mitigate this problem.
Researcher Affiliation Academia Jackie Baek Operations Research Center MIT baek@mit.edu Vivek F. Farias Sloan School of Management MIT vivekf@mit.edu
Pseudocode No The paper describes the PF-UCB algorithm in Section 4.1, but it is presented as a textual description of steps rather than a formal, structured pseudocode block or algorithm figure.
Open Source Code No The paper does not provide any explicit links to source code for the methodology described, nor does it state that the code is available in supplementary materials or upon request. It only provides a link to the full version of the paper itself.
Open Datasets Yes We use a publicly available dataset [30] to evaluate the effect of using a proportionally fair policy on learning the optimal personalized dose of warfarin. [30] Michelle Whirl-Carrillo, Ellen M Mc Donagh, JM Hebert, Li Gong, K Sangkuhl, CF Thorn, Russ B Altman, and Teri E Klein. Pharmacogenomics knowledge for personalized medicine. Clinical Pharmacology & Therapeutics, 92(4):414 417, 2012.
Dataset Splits No The paper mentions using a dataset for experiments but does not provide specific details on how it was split into training, validation, or test sets, nor does it refer to predefined splits with citations or describe a splitting methodology.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions implementing models but does not provide specific software dependencies or their version numbers (e.g., Python, PyTorch, specific solvers or libraries with versions).
Experiment Setup No The paper mentions general setup for the Warfarin Dosing Case Study: "We use a linear contextual bandit setup with five features and an intercept; three actions (dose levels) are available to any arriving patient." However, it does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or detailed system-level training settings.