reproducibilityindex.ai

Optimistic Policy Optimization with Bandit Feedback

Authors: Lior Shani, Yonathan Efroni, Aviv Rosenberg, Shie Mannor

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	For this setting, we propose an optimistic policy optimization algorithm for which we establish O(S2AH4K) regret for stochastic rewards. Furthermore, we prove O(S2AH4K2/3) regret for adversarial rewards.
Researcher Affiliation	Academia	1Technion Israel Institute of Technology, Haifa, Israel 2Tel Aviv University, Tel Aviv, Israel.
Pseudocode	Yes	Algorithm 1 POMD with Known Model; Algorithm 2 Optimistic POMD for Stochastic MDPs; Algorithm 3 Optimistic POMD for Adversarial MDPs
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	This paper is theoretical and focuses on algorithm design and proofs, rather than conducting empirical experiments on datasets. Therefore, no information about public datasets is provided.
Dataset Splits	No	This paper is theoretical and does not involve empirical experiments or dataset usage, so there are no dataset split details for validation.
Hardware Specification	No	This paper is theoretical and does not report on empirical experiments; therefore, no hardware specifications are mentioned.
Software Dependencies	No	This paper is theoretical and focuses on algorithm design and proofs, without mentioning any specific software dependencies or version numbers.
Experiment Setup	No	This paper is theoretical and does not report on empirical experiments; therefore, no experimental setup details such as hyperparameters or training configurations are provided.