Cascading Bandits: Optimizing Recommendation Frequency in Delayed Feedback Environments

Authors: Dairui Wang, Junyu Cao, Yan Zhang, Wei Qi

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiment on Am Ex dataset confirms the effectiveness of our algorithms. In this section, we evaluate the performance of both the non-contextual and contextual algorithms based on the features of the real Am Ex User Click dataset.
Researcher Affiliation Academia Dairui Wang Tsinghua University wdr23@mails.tsinghua.edu.cn Junyu Cao The University of Texas at Austin junyu.cao@mccombs.utexas.edu Yan Zhang Mc Gill University yan.zhang13@mail.mcgill.ca Wei Qi* Tsinghua University qiw@tsinghua.edu.cn
Pseudocode Yes Algorithm 1: Determine the optimal sequence S with frequency control. Algorithm 2: An online learning algorithm for cascading bandits with delayed feedback Algorithm 3: An online algorithm for contextual cascading bandits with delayed feedback
Open Source Code No The paper does not provide a statement about releasing the source code or a link to a code repository for the described methodology.
Open Datasets Yes In this section, we evaluate the performance of both the non-contextual and contextual algorithms based on the features of the real Am Ex User Click dataset3, which records over 463,000 recommendations of Am Ex from July 2 to July 7 in 2017. 3https://www.kaggle.com/code/muditagrawal/amex-user-click-prediction
Dataset Splits No The paper does not provide specific details on how the dataset was split into training, validation, and test sets, beyond mentioning its use for experiments.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, memory).
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes Experiment I: Non-contextual setting. There are N = 25 available messages. The attraction probability v is uniformly generated from distribution [0, 0.5], and the return R uniformly generated from distribution [1, 3], The maximum length of message list M is set to be 10. Based on estimations from the short lists in Am Ex dataset, we set q(m) = 1 1+exp(0.03m). We set the re-targeting window D = 200 in all settings. The response time τ is uniformly generated from [0, 3] for each user. Experiment II: Contextual setting. There are N = 25 available messages. The maximum length of message list M is set to be 20. User features are uniformly generated from [0, 5] [0, 5] [0, 5], and message features are uniformly generated from [ 6, 0] [0, 1] [0, 2] [ 5, 0]. The coefficient related to the abandonment behavior, denoted as αm, is uniformly generated in the range (fourdimensional including the intercept) 1.04 [ 0.064m, 0] [ 0.08m, 0] [ 0.16m, 0] for m = 1, , 20, where αm,1 is the intercept. An alternative coefficient αm is uniformly generated from 1.04 [ 0.004m, 0] [ 0.064m, 0] [ 0.08m, 0] for m = 1, , 20. The coefficient related to message attraction is β = (0.05, 0.2, 0.1, 0.3, 0.4), where β1 is the intercept. Users response time is uniformly distributed on [0,10].