Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards
Authors: Ashwinkumar Badanidiyuru Varadaraja, Zhe Feng, Tianxi Li, Haifeng Xu
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We formulate the offline version of this problem as a specially structured episodic Markov Decision Process (MDP) and then, for its online learning counterpart, propose a novel reinforcement learning (RL) algorithm with regret at most e O(H2 T)... Technically, our main result is the design of an RL algorithm for incrementality bidding that provably has regret at most e O(H2 T)... One key technical novelty of this paper is a new parameter estimation method, Pairwise Moment-Matching (PAMM) algorithm, that can estimate reward parameters under mixed and delayed conversion feedback. ... The estimators are provably consistent with a nearly-optimal convergence rate. |
| Researcher Affiliation | Collaboration | Ashwinkumar Badanidiyuru Google ashwinkumarbv@google.com Zhe Feng Google Research zhef@google.com Tianxi Li University of Virginia tianxili@virginia.edu Haifeng Xu University of Chicago haifengxu@uchicago.edu |
| Pseudocode | Yes | Algorithm 1 Pairwise Moment-Matching (PAMM) Algorithm |
| Open Source Code | No | Under '3. If you ran experiments...', the answer to '(a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)?' is [N/A]. |
| Open Datasets | No | The paper is theoretical and does not describe experiments using datasets, nor does it provide access information for any dataset. |
| Dataset Splits | No | The paper is theoretical and does not describe experiments, thus no dataset split information for training, validation, or testing is provided. |
| Hardware Specification | No | The paper is theoretical and does not describe any experiments that would require specific hardware, thus no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe experiments or implementations that would require specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup, hyperparameters, or system-level training settings. |