Bandit Learning with Implicit Feedback

Authors: Yi Qi, Qingyun Wu, Hongning Wang, Jie Tang, Maosong Sun

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our upper regret bound analysis of the proposed algorithm proves its feasibility of learning from implicit feedback in a bandit setting; and extensive empirical evaluations on click logs collected from a major MOOC platform further demonstrate its learning effectiveness in practice.
Researcher Affiliation Academia 1 State Key Lab of Intell. Tech. & Sys., Institution for Artificial Intelligence, Dept. of Comp. Sci. & Tech., Tsinghua University, Beijing, China 2 Department of Computer Science, University of Virginia
Pseudocode Yes Algorithm 1 Thompson sampling for E-C Bandit
Open Source Code Yes The data set with our manually crafted features and our model implementation have been made publicially available here: https://github.com/qy7171/ec_bandit.
Open Datasets Yes The MOOC data we used for evaluation is collected from a single course in a 4-month period. ... The data set with our manually crafted features and our model implementation have been made publicially available here: https://github.com/qy7171/ec_bandit.
Dataset Splits No The paper does not explicitly provide specific training, validation, or test dataset splits (e.g., percentages or sample counts) for the MOOC data, nor does it refer to standard predefined splits with citations for reproducibility.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments.
Software Dependencies No The paper does not list specific software dependencies with their version numbers (e.g., programming languages, libraries, or frameworks).
Experiment Setup Yes The context vector s dimension d C and d E are set to 5, and thus d = d C + d E = 10. We set |A| = 100, each of which is associated with a unique context vector (x C, x E). ... At each time t, an arm set At is randomly sampled from A such that | At| = 10, i.e., each time we offer 10 randomly selected arms for the algorithm to choose from.