Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes
Authors: Yuval Emek, Ron Lavi, Rad Niazadeh, Yangguang Shi
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We then prove that if the Markov decision process is guaranteed to admit an oracle that can simulate any given policy from any initial state with bounded loss a condition that is satisfied in the DRACC problem then the online learning problem can be solved with vanishing regret. Our proof technique is based on a reduction to online learning with switching cost, in which an online decision maker incurs an extra cost every time she switches from one arm to another. |
| Researcher Affiliation | Academia | Yuval Emek Technion Israel Institute of Technology Haifa, Israel yemek@technion.ac.il Ron Lavi Technion Israel Institute of Technology Haifa, Israel ronlavi@ie.technion.ac.il Rad Niazadeh University of Chicago Booth School of Business Chicago, IL, United States rad.niazadeh@chicagobooth.edu Yangguang Shi Technion Israel Institute of Technology Haifa, Israel shiyangguang@campus.technion.ac.il |
| Pseudocode | Yes | ALGORITHM 1: Online Dd-MDP algorithm C&S |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | The paper is theoretical and does not describe experiments using a dataset. |
| Dataset Splits | No | The paper is theoretical and does not describe experiments using a dataset. |
| Hardware Specification | No | The paper is theoretical and does not report on experiments requiring specific hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not report on experiments requiring specific software dependencies. |
| Experiment Setup | No | The paper is theoretical and does not report on experiments, thus no experimental setup details are provided. |