Performative Reinforcement Learning

Authors: Debmalya Mandal, Stelios Triantafyllou, Goran Radanovic

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, through extensive experiments on a grid-world environment, we demonstrate the dependence of convergence on various parameters e.g. regularization, smoothness, and the number of samples. In this section, we experimentally evaluate the performance of various repeated retraining methods, and determine the effects of various parameters on convergence.
Researcher Affiliation Academia Debmalya Mandal 1 Stelios Triantafyllou 1 Goran Radanovic 1 1Max Planck Institute for Software Systems, Saarbruecken, Germany.
Pseudocode Yes Algorithm 1 Alternating Optimization for the Empirical Lagrangian
Open Source Code Yes 5Code source: https://github.com/gradanovic/icml2023-performative-rl-paper-code
Open Datasets Yes All experiments are conducted on a grid-world environment proposed by (Triantafyllou et al., 2021).
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits, such as percentages, absolute counts, or references to predefined splits for the grid-world environment used.
Hardware Specification Yes All experiments were conducted on a computer cluster with machines equipped with 2 Intel Xeon E5-2667 v2 CPUs with 3.3GHz (16 cores) and 50 GB RAM.
Software Dependencies No The paper describes the computational environment but does not provide specific software dependencies with version numbers (e.g., names of libraries or frameworks with their respective versions).
Experiment Setup Yes All plots were generated with γ = 0.9 and 1000 iterations. The parameters are λ (regularization), β (smoothness), η (step-size), and m (number of trajectories). Figure 2 captions show specific values like 'RPO λ = 1', 'RPO β = 10', 'RGA λ = 1, β = 5'.