Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models

Authors: Viktor Bengs, Aadirupa Saha, Eyke Hüllermeier

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In an experimental study we show its empirical superiority over state-of-art algorithms for special cases of LST models in Section 4.
Researcher Affiliation Collaboration 1Institute of Informatics, LMU Munich, Germany 2Microsoft Research, New York City, US. Correspondence to: Viktor Bengs <Viktor.bengs@lmu.de>.
Pseudocode Yes Algorithm 1 COLSTIM; Algorithm 2 SUP-COLSTIM
Open Source Code No The paper does not include an unambiguous statement or link for the public release of its source code.
Open Datasets No The paper describes how context vectors and weight parameters are sampled for experiments (e.g., "context vectors are sampled uniformly at random from B1(2)") but does not refer to or provide access to a specific publicly available dataset with concrete access information (e.g., link, DOI, or formal citation).
Dataset Splits No The paper does not explicitly mention train/validation/test splits with specific percentages or counts. It describes experimental scenarios and how data is generated.
Hardware Specification Yes All these experiments were conducted on a machine featuring an Intel(R) Xeon E5-2670 @2.6GHz with 16 cores and 64 GB of RAM.
Software Dependencies No The paper mentions using SGD and fixed learning rates, but does not provide specific version numbers for libraries like PyTorch, TensorFlow, or scikit-learn. It also does not list specific versions for general programming languages.
Experiment Setup Yes For DTS and SS, we used the same hyperparameters as in the corresponding experiments, while for Max In P, we simply use t0 = dn and η = p d log(T), as the hyperparameters are not reported in the experiments by Saha (2021). Note that our choices are simplified versions of the parameters for which the theoretical guarantees hold. For COLSTIM, we use τ = t0 and c1 = Cthresh = η, as τ has a similar role as t0 and c1 or Cthresh have a similar role as η, respectively. For the coupling probability pt of COLSTIM, we use a simplified version of the one derived in Theorem 3.2, namely pt = min 1, d t τ log (d T) . In every experiment, the performances of the algorithms are measured in terms of cumulative average regret (cf. (8)), averaged across 100 runs and reported with their standard deviation. Moreover, for both Max In P and COLSTIM, we use the SGD-based variant described in Section 3.2 with a fixed learning rate of ηt = 1/2.