reproducibilityindex.ai

Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards

Authors: Ashwinkumar Badanidiyuru Varadaraja, Zhe Feng, Tianxi Li, Haifeng Xu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We formulate the ofﬂine version of this problem as a specially structured episodic Markov Decision Process (MDP) and then, for its online learning counterpart, propose a novel reinforcement learning (RL) algorithm with regret at most e O(H2 T)... Technically, our main result is the design of an RL algorithm for incrementality bidding that provably has regret at most e O(H2 T)... One key technical novelty of this paper is a new parameter estimation method, Pairwise Moment-Matching (PAMM) algorithm, that can estimate reward parameters under mixed and delayed conversion feedback. ... The estimators are provably consistent with a nearly-optimal convergence rate.
Researcher Affiliation	Collaboration	Ashwinkumar Badanidiyuru Google ashwinkumarbv@google.com Zhe Feng Google Research zhef@google.com Tianxi Li University of Virginia tianxili@virginia.edu Haifeng Xu University of Chicago haifengxu@uchicago.edu
Pseudocode	Yes	Algorithm 1 Pairwise Moment-Matching (PAMM) Algorithm
Open Source Code	No	Under '3. If you ran experiments...', the answer to '(a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)?' is [N/A].
Open Datasets	No	The paper is theoretical and does not describe experiments using datasets, nor does it provide access information for any dataset.
Dataset Splits	No	The paper is theoretical and does not describe experiments, thus no dataset split information for training, validation, or testing is provided.
Hardware Specification	No	The paper is theoretical and does not describe any experiments that would require specific hardware, thus no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and does not describe experiments or implementations that would require specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe an experimental setup, hyperparameters, or system-level training settings.