reproducibilityindex.ai

Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection

Authors: Matteo Papini, Andrea Tirinzoni, Aldo Pacchiano, Marcello Restelli, Alessandro Lazaric, Matteo Pirotta

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We ﬁrst derive a necessary condition on the representation, called universally spanning optimal features (UNISOFT), to achieve constant regret in any MDP with linear reward function. This result encompasses the well-known settings of low-rank MDPs and, more generally, zero inherent Bellman error (also known as the Bellman closure assumption). We then demonstrate that this condition is also sufﬁcient for these classes of problems by deriving a constant regret bound for two optimistic algorithms (LSVI-UCB and ELEANOR). Finally, we propose an algorithm for representation selection and we prove that it achieves constant regret when one of the given representations, or a suitable combination of them, satisﬁes the UNISOFT condition.
Researcher Affiliation	Collaboration	Matteo Papini Universitat Pompeu Fabra matteo.papini@upf.edu Andrea Tirinzoni INRIA Lille andrea.tirinzoni@inria.fr Aldo Pacchiano Microsoft Research apacchiano@microsoft.com Marcello Restilli Politecnico di Milano marcello.restelli@polimi.it Alessandro Lazaric Facebook AI Research lazaric@fb.com Matteo Pirotta Facebook AI Research pirotta@fb.com
Pseudocode	Yes	Algorithm 1: LSVI-LEADER
Open Source Code	No	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] We perform simple numerical simulations to support our theoretical ﬁndings. The provided pseudocode and experimental protocol is sufﬁcient to reproduce our results.
Open Datasets	No	The paper is theoretical and does not conduct experiments on datasets that would require training, validation, or test splits. It includes a statement 'We perform simple numerical simulations to support our theoretical findings' but does not provide details of any specific dataset used for these simulations or how to access them.
Dataset Splits	No	The paper is theoretical and does not conduct experiments on datasets that would require training, validation, or test splits. It includes a statement 'We perform simple numerical simulations to support our theoretical findings' but does not provide details of any specific dataset used for these simulations or how to access them.
Hardware Specification	No	The paper is theoretical and focuses on mathematical proofs and algorithm design. It does not describe any specific hardware used for running experiments or simulations.
Software Dependencies	No	The paper focuses on theoretical contributions and algorithm design. It does not provide specific version numbers for any software libraries, frameworks, or environments that would be needed to replicate experiments.
Experiment Setup	No	The paper is theoretical and does not detail an experimental setup with specific hyperparameters, training schedules, or system-level settings.