Reinforcement Learning of Sequential Price Mechanisms
Authors: Gianluca Brero, Alon Eden, Matthias Gerstgrasser, David Parkes, Duncan Rheingans-Yoo5219-5227
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we test the ability of standard RL algorithms to learn optimal SPMs across a wide range of settings. We report our results for the proximal policy optimization (PPO) algorithm (Schulman et al. 2017), a policy gradient algorithm where the learning objective is modiļ¬ed to prevent large gradient steps, and as implemented in Open AI Stable Baselines. |
| Researcher Affiliation | Academia | John A. Paulson School of Engineering and Applied Sciences, Harvard University gbrero, aloneden, matthias, parkes@g.harvard.edu, d.rheingansyoo@gmail.com |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states 'We use the Open AI Stable Baselines version v2.10.0 (https://github.com/hill-a/stable-baselines)' but this refers to a third-party library used, not the authors' own implementation code for their methodology. |
| Open Datasets | No | The paper describes how data is generated through sampling from specified distributions (e.g., 'sampled from a possibly correlated value distribution D', 'distributed uniformly on the set {1, 3}', 'draw vi independently from unif(z - 1/2, z + 1/2)') rather than using a publicly available dataset with a specific name or access information. |
| Dataset Splits | No | The paper states, 'At periodic intervals during training, we evaluate the objective of the current policy using a fresh set of samples,' which indicates evaluation during training, but it does not provide specific dataset split information (e.g., exact percentages or sample counts) for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as exact GPU/CPU models, processor types, or cloud instance specifications used for running its experiments. |
| Software Dependencies | Yes | We use the Open AI Stable Baselines version v2.10.0 (https://github.com/hill-a/stable-baselines). |
| Experiment Setup | No | The paper mentions using a 'standard 2-layer multilayer perceptron (MLP) network' and running experiments with '6 seeds', but it does not provide specific hyperparameter values such as learning rate, batch size, or optimizer settings needed to reproduce the experimental setup. |