Bandit Learning with Implicit Feedback
Authors: Yi Qi, Qingyun Wu, Hongning Wang, Jie Tang, Maosong Sun
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our upper regret bound analysis of the proposed algorithm proves its feasibility of learning from implicit feedback in a bandit setting; and extensive empirical evaluations on click logs collected from a major MOOC platform further demonstrate its learning effectiveness in practice. |
| Researcher Affiliation | Academia | 1 State Key Lab of Intell. Tech. & Sys., Institution for Artiļ¬cial Intelligence, Dept. of Comp. Sci. & Tech., Tsinghua University, Beijing, China 2 Department of Computer Science, University of Virginia |
| Pseudocode | Yes | Algorithm 1 Thompson sampling for E-C Bandit |
| Open Source Code | Yes | The data set with our manually crafted features and our model implementation have been made publicially available here: https://github.com/qy7171/ec_bandit. |
| Open Datasets | Yes | The MOOC data we used for evaluation is collected from a single course in a 4-month period. ... The data set with our manually crafted features and our model implementation have been made publicially available here: https://github.com/qy7171/ec_bandit. |
| Dataset Splits | No | The paper does not explicitly provide specific training, validation, or test dataset splits (e.g., percentages or sample counts) for the MOOC data, nor does it refer to standard predefined splits with citations for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with their version numbers (e.g., programming languages, libraries, or frameworks). |
| Experiment Setup | Yes | The context vector s dimension d C and d E are set to 5, and thus d = d C + d E = 10. We set |A| = 100, each of which is associated with a unique context vector (x C, x E). ... At each time t, an arm set At is randomly sampled from A such that | At| = 10, i.e., each time we offer 10 randomly selected arms for the algorithm to choose from. |