Contextual bandits with concave rewards, and an application to fair ranking
Authors: Virginie Do, Elvis Dohmatob, Matteo Pirotta, Alessandro Lazaric, Nicolas Usunier
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare the empirical performance of our algorithm to relevant baselines on a music recommendation task. We present two experimental evaluations of our approach, which are fully detailed in App. B. |
| Researcher Affiliation | Collaboration | Virginie Do PSL University & Meta AI virginiedo@meta.com Elvis Dohmatob, Matteo Pirotta, Alessandro Lazaric, Nicolas Usunier Meta AI {dohmatob,pirotta,lazaric,usunier}@meta.com |
| Pseudocode | Yes | Algorithm 1: FW-Lin UCBRank: linear contextual bandits for fair ranking. Algorithm 2: Generic Frank-Wolfe algorithm for CBCR. Algorithm 3: FW-lin UCB: linear CBCR with K arms. Algorithm 4: FW-Square CB: contextual bandits with concave rewards and regression oracles Algorithm 5: FW-lin UCBRank: linear contextual bandits for fair ranking. |
| Open Source Code | No | The paper does not contain an explicit statement or link to open-source code for the methodology described. |
| Open Datasets | Yes | Following (Patro et al., 2020), we use the Last.fm music dataset from (Cantador et al., 2011) |
| Dataset Splits | No | The paper mentions generating contexts and rewards from a dataset, and sets parameters like 'k = 10' for ranking slots, but does not provide explicit train/validation/test dataset splits, percentages, or methodology for splitting the data used in experiments. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments. |
| Software Dependencies | No | Our experiments are fully implemented in Python 3.9. Using the Python library Implicit, MIT License: https://implicit.readthedocs.io/. |
| Experiment Setup | Yes | For all algorithms, the regularization parameter of the Ridge regression is set to λ = 0.1. We set β = 0.5 for all objectives and for welf, we set α = 0.5. We use β0 = 0.01. We choose d = 10 in the data generation and λ = 0.1 in the Ridge regression. |