reproducibilityindex.ai

Borda Regret Minimization for Generalized Linear Dueling Bandits

Authors: Yue Wu, Tao Jin, Qiwei Di, Hao Lou, Farzad Farnoud, Quanquan Gu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluations on both synthetic data and a simulated real-world environment are conducted to corroborate our theoretical analysis. We conduct empirical studies to verify the correctness of our theoretical claims. Under both synthetic and realworld data settings, our algorithms can outperform all the baselines in terms of cumulative regret. This section compares the proposed algorithm BETC-GLM with existing ones that are capable of minimizing Borda regret. We use random responses (generated from fixed preferential matrices) to interact with all tested algorithms. Each algorithm is run for 50 times over a time horizon of T = 10^6. We report both the mean and the standard deviation of the cumulative Borda regret and supply some analysis.
Researcher Affiliation	Academia	1Department of Computer Science, University of California, Los Angeles, Los Angeles, California, United States 2Department of Computer Science, University of Virginia, Charlottesville, Virginia, United States 3Department of Electrical & Computer Engineering, University of Virginia, Charlottesville, Virginia, United States.
Pseudocode	Yes	Algorithm 1 BETC-GLM, Algorithm 2 BEXP3, Algorithm 3 UCB-BORDA, Algorithm 4 ETC-BORDA, Algorithm 5 G-OPTIMAL DESIGN BY FRANK-WOLFE, Algorithm 6 BEXP3 with large K
Open Source Code	No	The paper does not contain any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	Real-world Dataset To showcase the performance of the algorithms in a real-world setting, we use the Event Time dataset (Zhang et al., 2016). In this dataset, K = 100 historical events are compared in a pairwise fashion by crowd-sourced workers.
Dataset Splits	No	The paper describes simulation setups and an online learning context for experiments, rather than predefined train/validation/test splits on fixed datasets. For example, it mentions 'Each algorithm is run for 50 times over a time horizon of T = 10^6' and for the Event Time dataset 'During simulation, epi,j is the parameter of the Bernoulli distribution that is used to generate the responses whenever a pair (i, j) is queried.' No explicit train/validation/test splits are provided.
Hardware Specification	No	The paper does not specify any hardware details (e.g., GPU/CPU models, memory, or specific cloud instances) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) required to replicate the experiments.
Experiment Setup	Yes	BETC-GLM(-MATCH): Algorithm 1 proposed in this paper. For general link function, to find bθ by MLE in (5.1), 100 rounds of gradient descent are performed. The failure probability is set to δ = 1/T. Parameters τ and ϵ are set to values listed in Theorem 5.5. For BETC-GLM-MATCH, we use the τ and ϵ outlined in Theorem 5.1. BEXP3: The proposed method for adversarial Borda bandits displayed in Algorithm 2. η and γ are chosen to be the value stated in Theorem 6.1.