Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Bandits with Ranking Feedback

Authors: Davide Maran, Francesco Bacchiocchi, Francesco Emanuele Stradi, Matteo Castiglioni, Nicola Gatti, Marcello Restelli

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we numerically evaluate our DREE and R-LPE algorithms in a testbed, and we compare their performance with some baselines from the literature in different settings. We show that our algorithms dramatically outperform the baselines in terms of empirical regret.
Researcher Affiliation Academia Davide Maran Politecnico di Milano EMAIL Francesco Bacchiocchi Politecnico di Milano EMAIL Francesco Emanuele Stradi Politecnico di Milano EMAIL Matteo Castiglioni Politecnico di Milano EMAIL Nicola Gatti Politecnico di Milano EMAIL Marcello Restelli Politecnico di Milano EMAIL
Pseudocode Yes Algorithm 1 Dynamical Ranking Exploration-Exploitation (DREE)
Open Source Code Yes Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We provide the code.
Open Datasets No The paper specifies that 'rewards to be drawn from Gaussian random variables with unit variance' in simulated environments, but does not provide access information (link, DOI, citation) to a publicly available dataset.
Dataset Splits No The paper describes experiments in simulated environments over a time horizon but does not mention specific training, validation, or test dataset splits.
Hardware Specification Yes Compute As stated, the numerical simulations resulted to be very fast. For this reason, it was not necessary to run them on a server, and we used a personal computer with the following specifications: CPU: 11th Gen Intel(R) Core(TM) i7-1165G7 2.80 GHz RAM: 16,0 GB Operating system: Windows 11 System type: 64 bit
Software Dependencies No The paper mentions 'standard Python libraries' but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes In all these instances, we assume the rewards to be drawn from Gaussian random variables with unit variance, i.e., σ2 = 1, and we let the time horizon be equal to T = 2 105. Finally, for each algorithm, we evaluate the cumulative regret averaged over 50 runs. ... we evaluate the DREE algorithm with different choices of the δ parameter in the function f(t) = log(t)1+δ; precisely, we choose δ {1.0, 1.5, 2.0}.