Bandits with Preference Feedback: A Stackelberg Game Perspective
Authors: Barna Pásztor, Parnian Kassraie, Andreas Krause
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments are on finding the maxima of test functions commonly used in (non-convex) optimization literature [Jamil and Yang, 2013], given only preference feedback. These functions cover challenging optimization landscapes including several local optima, plateaus, and valleys, allowing us to test the versatility of MAXMINLCB. We use the Ackley function for illustration in the main text and provide the regret plots for the remainder of the functions in Appendix E. For all experiments, we set the horizon T = 2000 and evaluate all algorithms on a uniform mesh over the input domain of size 100. Additionally, we conducted experiments on the Yelp restaurant review dataset to demonstrate the applicability of MAXMINLCB on real-world data and its scaling to larger domains. |
| Researcher Affiliation | Academia | Barna Pásztor ,1,2 Parnian Kassraie ,1 Andreas Krause1,2 1ETH Zurich 2ETH AI Center {bpasztor, pkassraie, krausea}@ethz.ch |
| Pseudocode | Yes | Algorithm 1 MAXMINLCB |
| Open Source Code | Yes | The code is made available at github.com/lasgroup/Max Min LCB. |
| Open Datasets | Yes | Additionally, we conducted experiments on the Yelp restaurant review dataset to demonstrate the applicability of MAXMINLCB on real-world data and its scaling to larger domains. |
| Dataset Splits | No | The paper mentions evaluating algorithms on a "uniform mesh" and using the Yelp dataset but does not specify explicit training/validation/test splits or percentages. |
| Hardware Specification | No | We ran our experiments on a shared cluster equipped with various NVIDIA GPUs and AMD EPYC CPUs. Our default configuration for all experiments was a single GPU with 24 GB of memory, 16 CPU cores, and 16 GB of RAM. |
| Software Dependencies | No | The environments and algorithms are implemented end-to-end in JAX [Bradbury et al., 2018]. |
| Experiment Setup | Yes | We set δ = 0.1 for all algorithms. For GP-UCB and LGP-UCB, we set β = 1, and 0.25 for the noise variance. We use the Radial Basis Function (RBF) kernel and choose the variance and length scale parameters from [0.1, 1.0] to optimize their performance separately. For LGP-UCB, we tuned λ, the L2 penalty coefficient in Proposition 1, on the grid [0.0, 0.1, 1.0, 5.0] and B on [1.0, 2.0, 3.0]. The hyper-parameter selections were done for each algorithm separately to create a fair comparison. |