A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits

Authors: Pratik Gajane, Tanguy Urvoy, Fabrice Clérot

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental At the end, we provide experimental results using real data from information retrieval applications.
Researcher Affiliation Industry Orange-labs, Lannion, France
Pseudocode Yes Algorithm 1 REX3: Exp3 with relative feedback
Open Source Code No No explicit statement about providing open-source code or a link to a repository for the described methodology was found.
Open Datasets Yes We used several preference matrices issued from namely: ARXIV dataset (Yue and Joachims, 2011), LETOR NP2004 dataset (Liu et al., 2007), and MSLR30K dataset. ... These matrices are courtesy of Zoghi et al. (2014b) s authors.
Dataset Splits No The paper uses datasets for simulation but does not specify train/validation/test dataset splits with percentages or sample counts for reproducibility in the conventional supervised learning sense.
Hardware Specification No No specific hardware details (like GPU/CPU models, memory, or cloud instance types) used for running experiments were mentioned.
Software Dependencies No No specific software dependencies with version numbers were mentioned.
Experiment Setup Yes For our experiments we have considered the following state of the art algorithms: BTM (Yue and Joachims, 2011) with γ = 1.1 and δ = 1/T (explore-then-exploit setting), Condorcet-SAVAGE (Urvoy et al., 2013) with δ = 1/T, RUCB (Zoghi et al., 2014a) with α = 0.51, and SPARRING coupled with EXP3 (Ailon et al., 2014). We considered three versions of REX3: two non-anytime versions where the optimal γ is computed beforehand according to (6) with Gmax set respectively to T/2 and T/10 and one anytime version where γ is recomputed at each time step according to (6).