Efficient Performance Bounds for Primal-Dual Reinforcement Learning from Demonstrations
Authors: Angeliki Kamoutsi, Goran Banjac, John Lygeros
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We further present an equivalent no-regret online-learning interpretation.In Appendix E we provide preliminary empirical results on a simple tabular MDP in order to illustrate our formulations and theoretical results. |
| Researcher Affiliation | Academia | 1Automatic Control Laboratory, ETH Zurich, Switzerland. Correspondence to: Angeliki Kamoutsi <kamoutsa@ethz.ch>. |
| Pseudocode | Yes | Algorithm 1 Stochastic Primal-Dual Lf D |
| Open Source Code | No | The paper does not contain an explicit statement about providing open-source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | No | The paper mentions using 'a finite set of expert demonstrations' and conducting 'preliminary empirical results on a simple tabular MDP' but does not provide specific access information (link, DOI, formal citation) for any publicly available dataset. |
| Dataset Splits | No | The paper does not specify exact percentages or sample counts for training, validation, or test splits, nor does it reference predefined splits with citations. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU/GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies, libraries, or solver names with their version numbers needed for replication. |
| Experiment Setup | Yes | Input: number of iterations N, step-size η, radius β and Set θ1,i = 1 nµ , i [nµ], w1,i = 1 nc , i [nc], λ = 0 and learning rate η = 1 γ β Nnµ . |