Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes

Authors: Yuval Emek, Ron Lavi, Rad Niazadeh, Yangguang Shi

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We then prove that if the Markov decision process is guaranteed to admit an oracle that can simulate any given policy from any initial state with bounded loss a condition that is satisfied in the DRACC problem then the online learning problem can be solved with vanishing regret. Our proof technique is based on a reduction to online learning with switching cost, in which an online decision maker incurs an extra cost every time she switches from one arm to another.
Researcher Affiliation Academia Yuval Emek Technion Israel Institute of Technology Haifa, Israel yemek@technion.ac.il Ron Lavi Technion Israel Institute of Technology Haifa, Israel ronlavi@ie.technion.ac.il Rad Niazadeh University of Chicago Booth School of Business Chicago, IL, United States rad.niazadeh@chicagobooth.edu Yangguang Shi Technion Israel Institute of Technology Haifa, Israel shiyangguang@campus.technion.ac.il
Pseudocode Yes ALGORITHM 1: Online Dd-MDP algorithm C&S
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets No The paper is theoretical and does not describe experiments using a dataset.
Dataset Splits No The paper is theoretical and does not describe experiments using a dataset.
Hardware Specification No The paper is theoretical and does not report on experiments requiring specific hardware specifications.
Software Dependencies No The paper is theoretical and does not report on experiments requiring specific software dependencies.
Experiment Setup No The paper is theoretical and does not report on experiments, thus no experimental setup details are provided.