Bandits with Ranking Feedback
Authors: Davide Maran, Francesco Bacchiocchi, Francesco Emanuele Stradi, Matteo Castiglioni, Nicola Gatti, Marcello Restelli
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we numerically evaluate our DREE and R-LPE algorithms in a testbed, and we compare their performance with some baselines from the literature in different settings. We show that our algorithms dramatically outperform the baselines in terms of empirical regret. |
| Researcher Affiliation | Academia | Davide Maran Politecnico di Milano davide.maran@polimi.it Francesco Bacchiocchi Politecnico di Milano francesco.bacchiocchi@polimi.it Francesco Emanuele Stradi Politecnico di Milano francescoemanuele.stradi@polimi.it Matteo Castiglioni Politecnico di Milano matteo.castiglioni@polimi.it Nicola Gatti Politecnico di Milano nicola.gatti@polimi.it Marcello Restelli Politecnico di Milano marcello.restelli@polimi.it |
| Pseudocode | Yes | Algorithm 1 Dynamical Ranking Exploration-Exploitation (DREE) |
| Open Source Code | Yes | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We provide the code. |
| Open Datasets | No | The paper specifies that 'rewards to be drawn from Gaussian random variables with unit variance' in simulated environments, but does not provide access information (link, DOI, citation) to a publicly available dataset. |
| Dataset Splits | No | The paper describes experiments in simulated environments over a time horizon but does not mention specific training, validation, or test dataset splits. |
| Hardware Specification | Yes | Compute As stated, the numerical simulations resulted to be very fast. For this reason, it was not necessary to run them on a server, and we used a personal computer with the following specifications: CPU: 11th Gen Intel(R) Core(TM) i7-1165G7 2.80 GHz RAM: 16,0 GB Operating system: Windows 11 System type: 64 bit |
| Software Dependencies | No | The paper mentions 'standard Python libraries' but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | In all these instances, we assume the rewards to be drawn from Gaussian random variables with unit variance, i.e., σ2 = 1, and we let the time horizon be equal to T = 2 105. Finally, for each algorithm, we evaluate the cumulative regret averaged over 50 runs. ... we evaluate the DREE algorithm with different choices of the δ parameter in the function f(t) = log(t)1+δ; precisely, we choose δ {1.0, 1.5, 2.0}. |