reproducibilityindex.ai

Minimax Regret Bounds for Reinforcement Learning

Authors: Mohammad Gheshlaghi Azar, Ian Osband, Rémi Munos

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We show that an optimistic modiﬁcation to value iteration achieves a regret bound of e O(HSAT +H2S2A+HT) where H is the time horizon, S the number of states, A the number of actions and T the number of timesteps. This result improves over the best previous known bound e O(HSAT) achieved by the UCRL2 algorithm of Jaksch et al. (2010). The key signiﬁcance of our new results is that when T H3S3A and SA H, it leads to a regret of e O(HSAT) that matches the established lower bound of Ω(HSAT) up to a logarithmic factor. Our analysis contains two key insights.
Researcher Affiliation	Industry	1Deep Mind, London, UK. Correspondence to: Mohammad Gheshlaghi Azar <mazar@google.com>.
Pseudocode	Yes	Algorithm 1 UCBVI, Algorithm 2 UCB-Q-values, Algorithm 3 bonus_1, Algorithm 4 bonus_2
Open Source Code	No	The paper does not provide any concrete access to source code (e.g., specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described.
Open Datasets	No	The paper is theoretical and focuses on regret bounds for reinforcement learning in finite horizon MDPs. It does not mention using specific, publicly available datasets for training experiments.
Dataset Splits	No	The paper is theoretical and does not conduct empirical experiments, therefore it does not specify any dataset splits (training, validation, or test) for reproduction.
Hardware Specification	No	The paper is theoretical and does not describe any specific hardware used for experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe any experimental setup details such as hyperparameter values or training configurations.