Borda Regret Minimization for Generalized Linear Dueling Bandits

Authors: Yue Wu, Tao Jin, Qiwei Di, Hao Lou, Farzad Farnoud, Quanquan Gu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations on both synthetic data and a simulated real-world environment are conducted to corroborate our theoretical analysis. We conduct empirical studies to verify the correctness of our theoretical claims. Under both synthetic and realworld data settings, our algorithms can outperform all the baselines in terms of cumulative regret. This section compares the proposed algorithm BETC-GLM with existing ones that are capable of minimizing Borda regret. We use random responses (generated from fixed preferential matrices) to interact with all tested algorithms. Each algorithm is run for 50 times over a time horizon of T = 10^6. We report both the mean and the standard deviation of the cumulative Borda regret and supply some analysis.
Researcher Affiliation Academia 1Department of Computer Science, University of California, Los Angeles, Los Angeles, California, United States 2Department of Computer Science, University of Virginia, Charlottesville, Virginia, United States 3Department of Electrical & Computer Engineering, University of Virginia, Charlottesville, Virginia, United States.
Pseudocode Yes Algorithm 1 BETC-GLM, Algorithm 2 BEXP3, Algorithm 3 UCB-BORDA, Algorithm 4 ETC-BORDA, Algorithm 5 G-OPTIMAL DESIGN BY FRANK-WOLFE, Algorithm 6 BEXP3 with large K
Open Source Code No The paper does not contain any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes Real-world Dataset To showcase the performance of the algorithms in a real-world setting, we use the Event Time dataset (Zhang et al., 2016). In this dataset, K = 100 historical events are compared in a pairwise fashion by crowd-sourced workers.
Dataset Splits No The paper describes simulation setups and an online learning context for experiments, rather than predefined train/validation/test splits on fixed datasets. For example, it mentions 'Each algorithm is run for 50 times over a time horizon of T = 10^6' and for the Event Time dataset 'During simulation, epi,j is the parameter of the Bernoulli distribution that is used to generate the responses whenever a pair (i, j) is queried.' No explicit train/validation/test splits are provided.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU/CPU models, memory, or specific cloud instances) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) required to replicate the experiments.
Experiment Setup Yes BETC-GLM(-MATCH): Algorithm 1 proposed in this paper. For general link function, to find bθ by MLE in (5.1), 100 rounds of gradient descent are performed. The failure probability is set to δ = 1/T. Parameters τ and ϵ are set to values listed in Theorem 5.5. For BETC-GLM-MATCH, we use the τ and ϵ outlined in Theorem 5.1. BEXP3: The proposed method for adversarial Borda bandits displayed in Algorithm 2. η and γ are chosen to be the value stated in Theorem 6.1.