Reinforcement Learning of Sequential Price Mechanisms

Authors: Gianluca Brero, Alon Eden, Matthias Gerstgrasser, David Parkes, Duncan Rheingans-Yoo5219-5227

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we test the ability of standard RL algorithms to learn optimal SPMs across a wide range of settings. We report our results for the proximal policy optimization (PPO) algorithm (Schulman et al. 2017), a policy gradient algorithm where the learning objective is modified to prevent large gradient steps, and as implemented in Open AI Stable Baselines.
Researcher Affiliation Academia John A. Paulson School of Engineering and Applied Sciences, Harvard University gbrero, aloneden, matthias, parkes@g.harvard.edu, d.rheingansyoo@gmail.com
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper states 'We use the Open AI Stable Baselines version v2.10.0 (https://github.com/hill-a/stable-baselines)' but this refers to a third-party library used, not the authors' own implementation code for their methodology.
Open Datasets No The paper describes how data is generated through sampling from specified distributions (e.g., 'sampled from a possibly correlated value distribution D', 'distributed uniformly on the set {1, 3}', 'draw vi independently from unif(z - 1/2, z + 1/2)') rather than using a publicly available dataset with a specific name or access information.
Dataset Splits No The paper states, 'At periodic intervals during training, we evaluate the objective of the current policy using a fresh set of samples,' which indicates evaluation during training, but it does not provide specific dataset split information (e.g., exact percentages or sample counts) for training, validation, or testing.
Hardware Specification No The paper does not provide any specific hardware details such as exact GPU/CPU models, processor types, or cloud instance specifications used for running its experiments.
Software Dependencies Yes We use the Open AI Stable Baselines version v2.10.0 (https://github.com/hill-a/stable-baselines).
Experiment Setup No The paper mentions using a 'standard 2-layer multilayer perceptron (MLP) network' and running experiments with '6 seeds', but it does not provide specific hyperparameter values such as learning rate, batch size, or optimizer settings needed to reproduce the experimental setup.