Preference-based Pure Exploration
Authors: Apurv Shukla, Debabrota Basu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study the preference-based pure exploration problem for bandits with vectorvalued rewards. The rewards are ordered using a (given) preference cone C and our goal is to identify the set of Pareto optimal arms. First, to quantify the impact of preferences, we derive a novel lower bound on sample complexity for identifying the most preferred policy with a confidence level 1 δ. Our lower bound elicits the role played by the geometry of the preference cone and punctuates the difference in hardness compared to existing best-arm identification variants of the problem. We further explicate this geometry when the rewards follow Gaussian distributions. We then provide a convex relaxation of the lower bound and leverage it to design the Preference-based Track and Stop (Pre TS) algorithm that identifies the most preferred policy. Finally, we show that the sample complexity of Pre TS is asymptotically tight by deriving a new concentration inequality for vector-valued rewards. ... As future work, it would be interesting to verify our results on a real-world datasets. |
| Researcher Affiliation | Academia | Apurv Shukla Department of ECE, Texas A&M University College Station, TX 77840 apurv.shukla@umich.edu Debabrota Basu Equipe School, Univ. Lille, Inria, CNRS Centrale Lille, UMR-9189 CRISt AL, France debabrota.basu@inria.fr |
| Pseudocode | Yes | Algorithm 1 Preference-based Track-and-Stop (Pre TS) 1: Input: Confidence parameter δ, preference cone C 2: if supπ t min M ch Λπ t ( ˆ Mt) minz C P k Nk,td KL z ˆ M (k) t , z M (k) β(t, δ) then 3: Compute wt arg maxw K VC(w, ˆ Mt) 4: Play kt arg mink [K] Nk,t Pt s=1 ws 5: Observe reward rt 6: Construct an empirical estimator ˆ Mt 7: end if 8: Construct a Pareto Front ˆPτ from empirical means ˆ Mτ 9: Return: ˆPτ |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating the release of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not present experiments that use a specific dataset, therefore it does not mention public access information for any dataset. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments with dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not specify any software dependencies with version numbers for experimental replication. |
| Experiment Setup | No | The paper is theoretical and does not include details on experimental setup such as hyperparameters or training configurations. |