Performative Reinforcement Learning
Authors: Debmalya Mandal, Stelios Triantafyllou, Goran Radanovic
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, through extensive experiments on a grid-world environment, we demonstrate the dependence of convergence on various parameters e.g. regularization, smoothness, and the number of samples. In this section, we experimentally evaluate the performance of various repeated retraining methods, and determine the effects of various parameters on convergence. |
| Researcher Affiliation | Academia | Debmalya Mandal 1 Stelios Triantafyllou 1 Goran Radanovic 1 1Max Planck Institute for Software Systems, Saarbruecken, Germany. |
| Pseudocode | Yes | Algorithm 1 Alternating Optimization for the Empirical Lagrangian |
| Open Source Code | Yes | 5Code source: https://github.com/gradanovic/icml2023-performative-rl-paper-code |
| Open Datasets | Yes | All experiments are conducted on a grid-world environment proposed by (Triantafyllou et al., 2021). |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits, such as percentages, absolute counts, or references to predefined splits for the grid-world environment used. |
| Hardware Specification | Yes | All experiments were conducted on a computer cluster with machines equipped with 2 Intel Xeon E5-2667 v2 CPUs with 3.3GHz (16 cores) and 50 GB RAM. |
| Software Dependencies | No | The paper describes the computational environment but does not provide specific software dependencies with version numbers (e.g., names of libraries or frameworks with their respective versions). |
| Experiment Setup | Yes | All plots were generated with γ = 0.9 and 1000 iterations. The parameters are λ (regularization), β (smoothness), η (step-size), and m (number of trajectories). Figure 2 captions show specific values like 'RPO λ = 1', 'RPO β = 10', 'RGA λ = 1, β = 5'. |