Bandits with Ranking Feedback

Authors: Davide Maran, Francesco Bacchiocchi, Francesco Emanuele Stradi, Matteo Castiglioni, Nicola Gatti, Marcello Restelli

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we numerically evaluate our DREE and R-LPE algorithms in a testbed, and we compare their performance with some baselines from the literature in different settings. We show that our algorithms dramatically outperform the baselines in terms of empirical regret.
Researcher Affiliation Academia Davide Maran Politecnico di Milano davide.maran@polimi.it Francesco Bacchiocchi Politecnico di Milano francesco.bacchiocchi@polimi.it Francesco Emanuele Stradi Politecnico di Milano francescoemanuele.stradi@polimi.it Matteo Castiglioni Politecnico di Milano matteo.castiglioni@polimi.it Nicola Gatti Politecnico di Milano nicola.gatti@polimi.it Marcello Restelli Politecnico di Milano marcello.restelli@polimi.it
Pseudocode Yes Algorithm 1 Dynamical Ranking Exploration-Exploitation (DREE)
Open Source Code Yes Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We provide the code.
Open Datasets No The paper specifies that 'rewards to be drawn from Gaussian random variables with unit variance' in simulated environments, but does not provide access information (link, DOI, citation) to a publicly available dataset.
Dataset Splits No The paper describes experiments in simulated environments over a time horizon but does not mention specific training, validation, or test dataset splits.
Hardware Specification Yes Compute As stated, the numerical simulations resulted to be very fast. For this reason, it was not necessary to run them on a server, and we used a personal computer with the following specifications: CPU: 11th Gen Intel(R) Core(TM) i7-1165G7 2.80 GHz RAM: 16,0 GB Operating system: Windows 11 System type: 64 bit
Software Dependencies No The paper mentions 'standard Python libraries' but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes In all these instances, we assume the rewards to be drawn from Gaussian random variables with unit variance, i.e., σ2 = 1, and we let the time horizon be equal to T = 2 105. Finally, for each algorithm, we evaluate the cumulative regret averaged over 50 runs. ... we evaluate the DREE algorithm with different choices of the δ parameter in the function f(t) = log(t)1+δ; precisely, we choose δ {1.0, 1.5, 2.0}.