reproducibilityindex.ai

Learning to Mitigate AI Collusion on Economic Platforms

Authors: Gianluca Brero, Eric Mibuari, Nicolas Lepore, David C. Parkes

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate our learning approach via three main experiments. We ﬁrst consider performance in terms of consumer surplus, benchmarking our RL interventions against the ones introduced by Johnson et al. (2021). We demonstrate the ability to learn optimal leader strategies in the Stackelberg game with the followers across all the seeds we tested, signiﬁcantly outperforming existing interventions.
Researcher Affiliation	Academia	Gianluca Brero Data Science Initiative Brown University gianluca_brero@brown.edu Eric Mibuari School of Engineering and Applied Sciences Harvard University mibuari@g.harvard.edu Nicolas Lepore School of Engineering and Applied Sciences Harvard University nlepore33@gmail.com David C. Parkes School of Engineering and Applied Sciences Harvard University parkes@g.harvard.edu
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks; the methodology is described in narrative text.
Open Source Code	No	We will include it in the supplemental material.
Open Datasets	No	The paper describes a simulated platform economy for its experiments, rather than using an external publicly available dataset. 'As in Calvano et al. (2020a) and Johnson et al. (2021), we consider settings with two pricing agents with cost c = 1, quality indexes 1 = 2 = 2, and 0 = 0, and we set parameter µ = 0.25 to control horizontal differentiation.'
Dataset Splits	No	The paper describes simulation steps ('50k equilibrium steps and 30 reward steps', 'train our policies for 50 million steps in total') but does not refer to traditional training, validation, or test dataset splits, as its experiments are based on a simulated environment.
Hardware Specification	Yes	This coarsened price grid allows us to train a platform policy through Stackelberg POMDP for 50 million steps in 18 hours using a single core on a Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz machine.
Software Dependencies	Yes	To train the platform policy, we start from the A2C algorithm provided by Stable Baselines3 (Rafﬁn et al., 2021, MIT License).
Experiment Setup	Yes	The seller Q-learning algorithms are also trained using discount factor δ = 0.95, exploration rate "t = e βt with β = 1e 5, and learning rate = 0.15. We set up the Stackelberg POMDP environment using 50k equilibrium steps and 30 reward steps. In these initial experiments, we train our policies for 50 million steps in total.