reproducibilityindex.ai

Test-Time Regret Minimization in Meta Reinforcement Learning

Authors: Mirco Mutti, Aviv Tamar

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In our ﬁrst contribution, we demonstrate that the latter rate is nearly optimal by developing a novel lower bound for test-time regret minimization under separation, showing that a linear dependence with M is unavoidable. Then, we present a family of stronger yet reasonable assumptions beyond separation, which we call strong identiﬁability, enabling algorithms achieving fast rates log(H) and sublinear dependence with M simultaneously. Our paper provides a new understanding of the statistical barriers of test-time regret minimization and when fast rates can be achieved. ... Overview of the theoretical results. ... In this paper, we provided a formal study on the statistical barriers of test-time regret minimization under strong structural assumptions, shedding light on when meta RL can be expected to provide signiﬁcant beneﬁts over standard RL.
Researcher Affiliation	Academia	1Technion Israel Institute of Technology, Haifa, Israel. Correspondence to: Mirco Mutti <mirco.m@technion.ac.il>.
Pseudocode	Yes	Algorithm 1 Identify-then-Commit (Chen et al., 2022) ... Algorithm 2 Sampling Routine ... Algorithm 3 Double-Identify-then-Commit ... Algorithm 4 Tree-Identify-then-Commit ... Algorithm 5 Explore-Identify-then-Commit ... Algorithm 6 Revealing Policies Sampling
Open Source Code	No	The paper does not mention providing or linking to any open-source code for the methodology it describes.
Open Datasets	No	The paper is theoretical, focusing on mathematical analysis and algorithm design for Markov Decision Processes (MDPs) and bandits. It constructs problem instances for theoretical proofs (e.g., hard instance in Figure 1) but does not use or provide concrete access information for any publicly available or open datasets.
Dataset Splits	No	The paper is theoretical and does not describe empirical experiments with dataset splits. It provides theoretical analysis and proofs for its proposed algorithms and lower bounds, which do not involve training/validation/test splits of data.
Hardware Specification	No	The paper is theoretical in nature, focusing on mathematical proofs and algorithm design. It does not describe any computational experiments or specify any hardware used for such purposes.
Software Dependencies	No	The paper is theoretical and describes algorithms and proofs without mentioning specific software implementations or dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and focuses on algorithm design and mathematical analysis. It does not describe an empirical experimental setup with concrete hyperparameters or system-level training settings in the main text.