Optimal Algorithms for Stochastic Contextual Preference Bandits

Authors: Aadirupa Saha

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section gives empirical performances of our algorithms (Alg. 1 and 3) and compare them with some existing preference learning algorithms.
Researcher Affiliation Industry Microsoft Research, New York, US; aasa@microsoft.com.
Pseudocode Yes Algorithm 1 Maximum-Informative-Pair (Max In P)
Open Source Code No The paper does not provide any explicit statement or link regarding open-source code for the described methodology.
Open Datasets No The paper describes synthetic problem instances and functions for g() (Quadratic, Six-Hump Camel, Gold Stein) which are generated for experiments, but does not provide specific access information (links, DOIs, formal citations) to a publicly available or open dataset.
Dataset Splits No The paper does not provide specific dataset split information (percentages, sample counts, citations to predefined splits) for training, validation, or testing.
Hardware Specification No No specific hardware details (like GPU/CPU models or memory) used for running the experiments are provided.
Software Dependencies No The paper mentions using techniques (e.g., GP fitting, kernelized self-sparring) and refers to existing works ([29], [37]) but does not provide specific version numbers for any software, libraries, or frameworks used in the experiments.
Experiment Setup Yes For this experiment we fix d = 10 and K = 50. Fig. 2 shows both our algorithms Max In P and Sta D always outperform the rest... We use thsese 3 functions as g( ): 1. Quadratic, 2. Six-Hump Camel and 3. Gold Stein. For all cases, we fix d = 3 and K = 50.