reproducibilityindex.ai

Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning

Authors: Boxiang Lyu, Zhaoran Wang, Mladen Kolar, Zhuoran Yang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Our algorithm is based on the pessimism principle and only requires a mild assumption on the coverage of the ofﬂine data set. To the best of our knowledge, our work provides the ﬁrst ofﬂine RL algorithm for dynamic mechanism design without assuming uniform coverage. Our Contributions. We propose the ﬁrst ofﬂine reinforcement learning algorithm that can learn a dynamic mechanism from any given data set. Additionally, our algorithm does not make any assumption about data coverage and only assumes that the underlying action-value functions are approximately realizable and the function class is approximately complete (see Assumptions 2.3 and 2.4 for detailed discussions)...
Researcher Affiliation	Academia	1Booth School of Business, University of Chicago, Chicago, IL, USA 2Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, IL, USA 3Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ, USA.
Pseudocode	Yes	Algorithm 1 Policy Evaluation (Section 3.1), Algorithm 2 Soft Policy Iteration for Episodic MDPs (Section 3.1), Algorithm 3 Ofﬂine VCG Learn (Appendix C).
Open Source Code	No	The paper does not contain any statement about making source code publicly available or a link to a code repository.
Open Datasets	No	The paper mentions working with "a priori collected data set" (Abstract, Introduction) and a "precollected data set that contains K trajectories" (Section 2.2). However, it does not specify any actual public or open dataset by name, link, or citation that was used for empirical evaluation. The paper is theoretical and does not conduct experiments on a specific dataset.
Dataset Splits	No	The paper is purely theoretical and does not describe experimental dataset splits for training, validation, or testing.
Hardware Specification	No	The paper is theoretical and does not describe any computational experiments or the hardware used to run them.
Software Dependencies	No	The paper is theoretical and focuses on algorithm design and theoretical guarantees. It does not describe specific software implementations with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe any empirical experimental setup details like hyperparameters or training settings.