Optimal Algorithms for Stochastic Contextual Preference Bandits
Authors: Aadirupa Saha
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section gives empirical performances of our algorithms (Alg. 1 and 3) and compare them with some existing preference learning algorithms. |
| Researcher Affiliation | Industry | Microsoft Research, New York, US; aasa@microsoft.com. |
| Pseudocode | Yes | Algorithm 1 Maximum-Informative-Pair (Max In P) |
| Open Source Code | No | The paper does not provide any explicit statement or link regarding open-source code for the described methodology. |
| Open Datasets | No | The paper describes synthetic problem instances and functions for g() (Quadratic, Six-Hump Camel, Gold Stein) which are generated for experiments, but does not provide specific access information (links, DOIs, formal citations) to a publicly available or open dataset. |
| Dataset Splits | No | The paper does not provide specific dataset split information (percentages, sample counts, citations to predefined splits) for training, validation, or testing. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models or memory) used for running the experiments are provided. |
| Software Dependencies | No | The paper mentions using techniques (e.g., GP fitting, kernelized self-sparring) and refers to existing works ([29], [37]) but does not provide specific version numbers for any software, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | For this experiment we fix d = 10 and K = 50. Fig. 2 shows both our algorithms Max In P and Sta D always outperform the rest... We use thsese 3 functions as g( ): 1. Quadratic, 2. Six-Hump Camel and 3. Gold Stein. For all cases, we fix d = 3 and K = 50. |