Learning to Mitigate AI Collusion on Economic Platforms
Authors: Gianluca Brero, Eric Mibuari, Nicolas Lepore, David C. Parkes
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate our learning approach via three main experiments. We first consider performance in terms of consumer surplus, benchmarking our RL interventions against the ones introduced by Johnson et al. (2021). We demonstrate the ability to learn optimal leader strategies in the Stackelberg game with the followers across all the seeds we tested, significantly outperforming existing interventions. |
| Researcher Affiliation | Academia | Gianluca Brero Data Science Initiative Brown University gianluca_brero@brown.edu Eric Mibuari School of Engineering and Applied Sciences Harvard University mibuari@g.harvard.edu Nicolas Lepore School of Engineering and Applied Sciences Harvard University nlepore33@gmail.com David C. Parkes School of Engineering and Applied Sciences Harvard University parkes@g.harvard.edu |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks; the methodology is described in narrative text. |
| Open Source Code | No | We will include it in the supplemental material. |
| Open Datasets | No | The paper describes a simulated platform economy for its experiments, rather than using an external publicly available dataset. 'As in Calvano et al. (2020a) and Johnson et al. (2021), we consider settings with two pricing agents with cost c = 1, quality indexes 1 = 2 = 2, and 0 = 0, and we set parameter µ = 0.25 to control horizontal differentiation.' |
| Dataset Splits | No | The paper describes simulation steps ('50k equilibrium steps and 30 reward steps', 'train our policies for 50 million steps in total') but does not refer to traditional training, validation, or test dataset splits, as its experiments are based on a simulated environment. |
| Hardware Specification | Yes | This coarsened price grid allows us to train a platform policy through Stackelberg POMDP for 50 million steps in 18 hours using a single core on a Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz machine. |
| Software Dependencies | Yes | To train the platform policy, we start from the A2C algorithm provided by Stable Baselines3 (Raffin et al., 2021, MIT License). |
| Experiment Setup | Yes | The seller Q-learning algorithms are also trained using discount factor δ = 0.95, exploration rate "t = e βt with β = 1e 5, and learning rate = 0.15. We set up the Stackelberg POMDP environment using 50k equilibrium steps and 30 reward steps. In these initial experiments, we train our policies for 50 million steps in total. |