Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Bandits with Ranking Feedback
Authors: Davide Maran, Francesco Bacchiocchi, Francesco Emanuele Stradi, Matteo Castiglioni, Nicola Gatti, Marcello Restelli
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we numerically evaluate our DREE and R-LPE algorithms in a testbed, and we compare their performance with some baselines from the literature in different settings. We show that our algorithms dramatically outperform the baselines in terms of empirical regret. |
| Researcher Affiliation | Academia | Davide Maran Politecnico di Milano EMAIL Francesco Bacchiocchi Politecnico di Milano EMAIL Francesco Emanuele Stradi Politecnico di Milano EMAIL Matteo Castiglioni Politecnico di Milano EMAIL Nicola Gatti Politecnico di Milano EMAIL Marcello Restelli Politecnico di Milano EMAIL |
| Pseudocode | Yes | Algorithm 1 Dynamical Ranking Exploration-Exploitation (DREE) |
| Open Source Code | Yes | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We provide the code. |
| Open Datasets | No | The paper specifies that 'rewards to be drawn from Gaussian random variables with unit variance' in simulated environments, but does not provide access information (link, DOI, citation) to a publicly available dataset. |
| Dataset Splits | No | The paper describes experiments in simulated environments over a time horizon but does not mention specific training, validation, or test dataset splits. |
| Hardware Specification | Yes | Compute As stated, the numerical simulations resulted to be very fast. For this reason, it was not necessary to run them on a server, and we used a personal computer with the following specifications: CPU: 11th Gen Intel(R) Core(TM) i7-1165G7 2.80 GHz RAM: 16,0 GB Operating system: Windows 11 System type: 64 bit |
| Software Dependencies | No | The paper mentions 'standard Python libraries' but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | In all these instances, we assume the rewards to be drawn from Gaussian random variables with unit variance, i.e., σ2 = 1, and we let the time horizon be equal to T = 2 105. Finally, for each algorithm, we evaluate the cumulative regret averaged over 50 runs. ... we evaluate the DREE algorithm with different choices of the δ parameter in the function f(t) = log(t)1+δ; precisely, we choose δ {1.0, 1.5, 2.0}. |