Cascading Bandits: Optimizing Recommendation Frequency in Delayed Feedback Environments
Authors: Dairui Wang, Junyu Cao, Yan Zhang, Wei Qi
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiment on Am Ex dataset confirms the effectiveness of our algorithms. In this section, we evaluate the performance of both the non-contextual and contextual algorithms based on the features of the real Am Ex User Click dataset. |
| Researcher Affiliation | Academia | Dairui Wang Tsinghua University wdr23@mails.tsinghua.edu.cn Junyu Cao The University of Texas at Austin junyu.cao@mccombs.utexas.edu Yan Zhang Mc Gill University yan.zhang13@mail.mcgill.ca Wei Qi* Tsinghua University qiw@tsinghua.edu.cn |
| Pseudocode | Yes | Algorithm 1: Determine the optimal sequence S with frequency control. Algorithm 2: An online learning algorithm for cascading bandits with delayed feedback Algorithm 3: An online algorithm for contextual cascading bandits with delayed feedback |
| Open Source Code | No | The paper does not provide a statement about releasing the source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | In this section, we evaluate the performance of both the non-contextual and contextual algorithms based on the features of the real Am Ex User Click dataset3, which records over 463,000 recommendations of Am Ex from July 2 to July 7 in 2017. 3https://www.kaggle.com/code/muditagrawal/amex-user-click-prediction |
| Dataset Splits | No | The paper does not provide specific details on how the dataset was split into training, validation, and test sets, beyond mentioning its use for experiments. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, memory). |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | Experiment I: Non-contextual setting. There are N = 25 available messages. The attraction probability v is uniformly generated from distribution [0, 0.5], and the return R uniformly generated from distribution [1, 3], The maximum length of message list M is set to be 10. Based on estimations from the short lists in Am Ex dataset, we set q(m) = 1 1+exp(0.03m). We set the re-targeting window D = 200 in all settings. The response time τ is uniformly generated from [0, 3] for each user. Experiment II: Contextual setting. There are N = 25 available messages. The maximum length of message list M is set to be 20. User features are uniformly generated from [0, 5] [0, 5] [0, 5], and message features are uniformly generated from [ 6, 0] [0, 1] [0, 2] [ 5, 0]. The coefficient related to the abandonment behavior, denoted as αm, is uniformly generated in the range (fourdimensional including the intercept) 1.04 [ 0.064m, 0] [ 0.08m, 0] [ 0.16m, 0] for m = 1, , 20, where αm,1 is the intercept. An alternative coefficient αm is uniformly generated from 1.04 [ 0.004m, 0] [ 0.064m, 0] [ 0.08m, 0] for m = 1, , 20. The coefficient related to message attraction is β = (0.05, 0.2, 0.1, 0.3, 0.4), where β1 is the intercept. Users response time is uniformly distributed on [0,10]. |