Practical Contextual Bandits with Feedback Graphs

Authors: Mengxiao Zhang, Yuheng Zhang, Olga Vrousgou, Haipeng Luo, Paul Mineiro

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we use empirical results to demonstrate the significant benefits of Square CB.G in leveraging the graph feedback structure and its superior effectiveness compared to Square CB. Following Foster and Krishnamurthy [2021], we use progressive validation (PV) loss as the evaluation metric, defined as Lpv(T) = 1 T PT t=1 ℓt,at. All the feedback graphs used in the experiments are deterministic. We run experiments on CPU Intel Xeon Gold 6240R 2.4G and the convex program solver is implemented via Vowpal Wabbit [Langford et al., 2007]. Figure 1: Left figure: Performance of Square CB.G on RCV1 dataset under three different feedback graphs. Right figure: Performance comparison between Square CB.G and Square CB under random directed self-aware feedback graphs.
Researcher Affiliation Collaboration Mengxiao Zhang University of Southern California mengxiao.zhang@usc.edu Yuheng Zhang University of Illinois Urbana-Champaign yuhengz2@illinois.edu Olga Vrousgou Microsoft Research Olga.Vrousgou@microsoft.com Haipeng Luo University of Southern California haipengl@usc.edu Paul Mineiro Microsoft Research pmineiro@microsoft.com
Pseudocode Yes Algorithm 1 Square CB.G. Note Theorem 3.6 provides an efficient implementation of Eq. (4). Input: parameter γ 4, a regression oracle Alg Sq for t = 1, 2, . . . , T do Receive context xt and directed feedback graph Gt. Obtain an estimator bft from the oracle Alg Sq. Compute the distribution pt (K) such that pt = argminp (K) decγ(p; bft, xt, Gt), where decγ(p; bft, xt, Gt) := sup a [K] f Φ f (xt, a) f (xt, a ) γ 4 EA Gt( |a) a A ( bft(xt, a ) f (xt, a ))2 ## and Φ := X [K] 7 R. Sample at from pt and observe {ℓt,j}j At where At Gt( |at). Feed the tuples {(xt, j, ℓt,j)}j At to the oracle Alg Sq. end
Open Source Code No The paper states: 'Our code is built on Py Torch framework [Paszke et al., 2019]' and 'the convex program solver is implemented via Vowpal Wabbit [Langford et al., 2007]'. Appendix A.3 provides a 'Python Solution to Eq. (5)' as a code snippet. However, there is no explicit statement or link indicating that the authors are releasing the complete source code for their proposed methodology (Square CB.G).
Open Datasets Yes We conduct experiments on RCV1 [Lewis et al., 2004], which is a multilabel text-categorization dataset.
Dataset Splits No The paper mentions using 'progressive validation (PV) loss' as an evaluation metric and uses 'RCV1 dataset' and a 'Synthetic Inventory Dataset'. While these indicate data usage, the paper does not explicitly provide information on specific train, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits).
Hardware Specification Yes We run experiments on CPU Intel Xeon Gold 6240R 2.4G
Software Dependencies No The paper states: 'Our code is built on Py Torch framework [Paszke et al., 2019]' and 'the convex program solver is implemented via Vowpal Wabbit [Langford et al., 2007]'. While these are software dependencies, specific version numbers for PyTorch and Vowpal Wabbit (other than the year of the paper they cite) are not provided, which is required for a reproducible description of ancillary software.
Experiment Setup Yes The oracle is implemented by applying online gradient descent with learning rate η searched over {0.1, 0.2, 0.5, 1, 2, 4}. As suggested by [Foster and Krishnamurthy, 2021], we use a time-varying exploration parameter γt = c / √αt, where t is the index of the iteration, c is searched over {8, 16, 32, 64, 128}, and α is the independence number of the corresponding feedback graph.