Follow-ups Also Matter: Improving Contextual Bandits via Post-serving Contexts

Authors: Chaoqi Wang, Ziyu Ye, Zhe Feng, Ashwinkumar Badanidiyuru Varadaraja, Haifeng Xu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive empirical tests on both synthetic and real-world datasets demonstrate the significant benefit of utilizing post-serving contexts as well as the superior performance of our algorithm over the state-of-the-art approaches.7 Experiments This section presents a comprehensive evaluation of our proposed po Lin UCB algorithm on both synthetic and real-world data, demonstrating its effectiveness in incorporating follow-up information and outperforming the Lin UCB(bϕ) variant.
Researcher Affiliation Collaboration University of Chicago1 Google Research2 Google3 {chaoqi, ziyuye, haifengxu}@uchicago.edu {zhef, ashwinkumarbv}@google.com
Pseudocode Yes Algorithm 1 po Lin UCB (Linear UCB with post-serving contexts)
Open Source Code No No explicit statement about releasing source code or a direct link to a repository for the described methodology was found.
Open Datasets Yes The evaluation was conducted on a real-world dataset, Movie Lens (Harper and Konstan, 2015)
Dataset Splits No The paper mentions dividing user feature vectors into pre-serving and post-serving contexts, but does not provide specific details on train/validation/test dataset splits (e.g., percentages, sample counts, or explicit splitting methodology).
Hardware Specification No No specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running experiments were found.
Software Dependencies No The paper mentions 'Adam optimizer' and 'neural network' but does not provide specific software dependencies with version numbers (e.g., 'PyTorch 1.9').
Experiment Setup Yes Evaluation Setup. We adopt three different synthetic environments... In each environment, the dimensions of the pre-serving context (dx) and the post-serving context (dz) are of 100 and 5, respectively with 10 arms (K). The evaluation spans T = 1000 or 5000 time steps, and each experiment is repeated with 10 different seeds.We fit the function ϕ(x) using a two-layer neural network with 64 hidden units and ReLU activation. The network was trained using the Adam optimizer with a learning rate of 1e-3. At each iteration, we randomly sampled a user from the dataset... The evaluation spanned T = 500 iterations and repeated with 10 seeds.