Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning
Authors: Boxiang Lyu, Zhaoran Wang, Mladen Kolar, Zhuoran Yang
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our algorithm is based on the pessimism principle and only requires a mild assumption on the coverage of the offline data set. To the best of our knowledge, our work provides the first offline RL algorithm for dynamic mechanism design without assuming uniform coverage. Our Contributions. We propose the first offline reinforcement learning algorithm that can learn a dynamic mechanism from any given data set. Additionally, our algorithm does not make any assumption about data coverage and only assumes that the underlying action-value functions are approximately realizable and the function class is approximately complete (see Assumptions 2.3 and 2.4 for detailed discussions)... |
| Researcher Affiliation | Academia | 1Booth School of Business, University of Chicago, Chicago, IL, USA 2Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, IL, USA 3Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ, USA. |
| Pseudocode | Yes | Algorithm 1 Policy Evaluation (Section 3.1), Algorithm 2 Soft Policy Iteration for Episodic MDPs (Section 3.1), Algorithm 3 Offline VCG Learn (Appendix C). |
| Open Source Code | No | The paper does not contain any statement about making source code publicly available or a link to a code repository. |
| Open Datasets | No | The paper mentions working with "a priori collected data set" (Abstract, Introduction) and a "precollected data set that contains K trajectories" (Section 2.2). However, it does not specify any actual public or open dataset by name, link, or citation that was used for empirical evaluation. The paper is theoretical and does not conduct experiments on a specific dataset. |
| Dataset Splits | No | The paper is purely theoretical and does not describe experimental dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any computational experiments or the hardware used to run them. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithm design and theoretical guarantees. It does not describe specific software implementations with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe any empirical experimental setup details like hyperparameters or training settings. |