reproducibilityindex.ai

Bandit Learning with Implicit Feedback

Authors: Yi Qi, Qingyun Wu, Hongning Wang, Jie Tang, Maosong Sun

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our upper regret bound analysis of the proposed algorithm proves its feasibility of learning from implicit feedback in a bandit setting; and extensive empirical evaluations on click logs collected from a major MOOC platform further demonstrate its learning effectiveness in practice.
Researcher Affiliation	Academia	1 State Key Lab of Intell. Tech. & Sys., Institution for Artiﬁcial Intelligence, Dept. of Comp. Sci. & Tech., Tsinghua University, Beijing, China 2 Department of Computer Science, University of Virginia
Pseudocode	Yes	Algorithm 1 Thompson sampling for E-C Bandit
Open Source Code	Yes	The data set with our manually crafted features and our model implementation have been made publicially available here: https://github.com/qy7171/ec_bandit.
Open Datasets	Yes	The MOOC data we used for evaluation is collected from a single course in a 4-month period. ... The data set with our manually crafted features and our model implementation have been made publicially available here: https://github.com/qy7171/ec_bandit.
Dataset Splits	No	The paper does not explicitly provide specific training, validation, or test dataset splits (e.g., percentages or sample counts) for the MOOC data, nor does it refer to standard predefined splits with citations for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with their version numbers (e.g., programming languages, libraries, or frameworks).
Experiment Setup	Yes	The context vector s dimension d C and d E are set to 5, and thus d = d C + d E = 10. We set \|A\| = 100, each of which is associated with a unique context vector (x C, x E). ... At each time t, an arm set At is randomly sampled from A such that \| At\| = 10, i.e., each time we offer 10 randomly selected arms for the algorithm to choose from.