Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards

Authors: Ashwinkumar Badanidiyuru Varadaraja, Zhe Feng, Tianxi Li, Haifeng Xu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We formulate the offline version of this problem as a specially structured episodic Markov Decision Process (MDP) and then, for its online learning counterpart, propose a novel reinforcement learning (RL) algorithm with regret at most e O(H2 T)... Technically, our main result is the design of an RL algorithm for incrementality bidding that provably has regret at most e O(H2 T)... One key technical novelty of this paper is a new parameter estimation method, Pairwise Moment-Matching (PAMM) algorithm, that can estimate reward parameters under mixed and delayed conversion feedback. ... The estimators are provably consistent with a nearly-optimal convergence rate.
Researcher Affiliation Collaboration Ashwinkumar Badanidiyuru Google ashwinkumarbv@google.com Zhe Feng Google Research zhef@google.com Tianxi Li University of Virginia tianxili@virginia.edu Haifeng Xu University of Chicago haifengxu@uchicago.edu
Pseudocode Yes Algorithm 1 Pairwise Moment-Matching (PAMM) Algorithm
Open Source Code No Under '3. If you ran experiments...', the answer to '(a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)?' is [N/A].
Open Datasets No The paper is theoretical and does not describe experiments using datasets, nor does it provide access information for any dataset.
Dataset Splits No The paper is theoretical and does not describe experiments, thus no dataset split information for training, validation, or testing is provided.
Hardware Specification No The paper is theoretical and does not describe any experiments that would require specific hardware, thus no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not describe experiments or implementations that would require specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe an experimental setup, hyperparameters, or system-level training settings.