A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits
Authors: Pratik Gajane, Tanguy Urvoy, Fabrice Clérot
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | At the end, we provide experimental results using real data from information retrieval applications. |
| Researcher Affiliation | Industry | Orange-labs, Lannion, France |
| Pseudocode | Yes | Algorithm 1 REX3: Exp3 with relative feedback |
| Open Source Code | No | No explicit statement about providing open-source code or a link to a repository for the described methodology was found. |
| Open Datasets | Yes | We used several preference matrices issued from namely: ARXIV dataset (Yue and Joachims, 2011), LETOR NP2004 dataset (Liu et al., 2007), and MSLR30K dataset. ... These matrices are courtesy of Zoghi et al. (2014b) s authors. |
| Dataset Splits | No | The paper uses datasets for simulation but does not specify train/validation/test dataset splits with percentages or sample counts for reproducibility in the conventional supervised learning sense. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, memory, or cloud instance types) used for running experiments were mentioned. |
| Software Dependencies | No | No specific software dependencies with version numbers were mentioned. |
| Experiment Setup | Yes | For our experiments we have considered the following state of the art algorithms: BTM (Yue and Joachims, 2011) with γ = 1.1 and δ = 1/T (explore-then-exploit setting), Condorcet-SAVAGE (Urvoy et al., 2013) with δ = 1/T, RUCB (Zoghi et al., 2014a) with α = 0.51, and SPARRING coupled with EXP3 (Ailon et al., 2014). We considered three versions of REX3: two non-anytime versions where the optimal γ is computed beforehand according to (6) with Gmax set respectively to T/2 and T/10 and one anytime version where γ is recomputed at each time step according to (6). |