Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models
Authors: Viktor Bengs, Aadirupa Saha, Eyke Hüllermeier
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In an experimental study we show its empirical superiority over state-of-art algorithms for special cases of LST models in Section 4. |
| Researcher Affiliation | Collaboration | 1Institute of Informatics, LMU Munich, Germany 2Microsoft Research, New York City, US. Correspondence to: Viktor Bengs <Viktor.bengs@lmu.de>. |
| Pseudocode | Yes | Algorithm 1 COLSTIM; Algorithm 2 SUP-COLSTIM |
| Open Source Code | No | The paper does not include an unambiguous statement or link for the public release of its source code. |
| Open Datasets | No | The paper describes how context vectors and weight parameters are sampled for experiments (e.g., "context vectors are sampled uniformly at random from B1(2)") but does not refer to or provide access to a specific publicly available dataset with concrete access information (e.g., link, DOI, or formal citation). |
| Dataset Splits | No | The paper does not explicitly mention train/validation/test splits with specific percentages or counts. It describes experimental scenarios and how data is generated. |
| Hardware Specification | Yes | All these experiments were conducted on a machine featuring an Intel(R) Xeon E5-2670 @2.6GHz with 16 cores and 64 GB of RAM. |
| Software Dependencies | No | The paper mentions using SGD and fixed learning rates, but does not provide specific version numbers for libraries like PyTorch, TensorFlow, or scikit-learn. It also does not list specific versions for general programming languages. |
| Experiment Setup | Yes | For DTS and SS, we used the same hyperparameters as in the corresponding experiments, while for Max In P, we simply use t0 = dn and η = p d log(T), as the hyperparameters are not reported in the experiments by Saha (2021). Note that our choices are simplified versions of the parameters for which the theoretical guarantees hold. For COLSTIM, we use τ = t0 and c1 = Cthresh = η, as τ has a similar role as t0 and c1 or Cthresh have a similar role as η, respectively. For the coupling probability pt of COLSTIM, we use a simplified version of the one derived in Theorem 3.2, namely pt = min 1, d t τ log (d T) . In every experiment, the performances of the algorithms are measured in terms of cumulative average regret (cf. (8)), averaged across 100 runs and reported with their standard deviation. Moreover, for both Max In P and COLSTIM, we use the SGD-based variant described in Section 3.2 with a fixed learning rate of ηt = 1/2. |