reproducibilityindex.ai

Reinforcement Learning of Sequential Price Mechanisms

Authors: Gianluca Brero, Alon Eden, Matthias Gerstgrasser, David Parkes, Duncan Rheingans-Yoo5219-5227

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we test the ability of standard RL algorithms to learn optimal SPMs across a wide range of settings. We report our results for the proximal policy optimization (PPO) algorithm (Schulman et al. 2017), a policy gradient algorithm where the learning objective is modiﬁed to prevent large gradient steps, and as implemented in Open AI Stable Baselines.
Researcher Affiliation	Academia	John A. Paulson School of Engineering and Applied Sciences, Harvard University gbrero, aloneden, matthias, parkes@g.harvard.edu, d.rheingansyoo@gmail.com
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper states 'We use the Open AI Stable Baselines version v2.10.0 (https://github.com/hill-a/stable-baselines)' but this refers to a third-party library used, not the authors' own implementation code for their methodology.
Open Datasets	No	The paper describes how data is generated through sampling from specified distributions (e.g., 'sampled from a possibly correlated value distribution D', 'distributed uniformly on the set {1, 3}', 'draw vi independently from unif(z - 1/2, z + 1/2)') rather than using a publicly available dataset with a specific name or access information.
Dataset Splits	No	The paper states, 'At periodic intervals during training, we evaluate the objective of the current policy using a fresh set of samples,' which indicates evaluation during training, but it does not provide specific dataset split information (e.g., exact percentages or sample counts) for training, validation, or testing.
Hardware Specification	No	The paper does not provide any specific hardware details such as exact GPU/CPU models, processor types, or cloud instance specifications used for running its experiments.
Software Dependencies	Yes	We use the Open AI Stable Baselines version v2.10.0 (https://github.com/hill-a/stable-baselines).
Experiment Setup	No	The paper mentions using a 'standard 2-layer multilayer perceptron (MLP) network' and running experiments with '6 seeds', but it does not provide specific hyperparameter values such as learning rate, batch size, or optimizer settings needed to reproduce the experimental setup.