Efficient Model-Free Exploration in Low-Rank MDPs

Authors: Zak Mhammedi, Adam Block, Dylan J Foster, Alexander Rakhlin

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical While a detailed experimental evaluation is outside of the scope of this paper, we are optimistic about the empirical performance of the algorithm in light of the encouraging results based on the same objective in Zhang et al. [49]
Researcher Affiliation Collaboration Zakaria Mhammedi MIT mhammedi@mit.edu Adam Block MIT ablock@mit.edu Dylan J. Foster Microsoft Research dylanfoster@microsoft.com Alexander Rakhlin MIT rakhlin@mit.edu
Pseudocode Yes Algorithm 1 Span RL: Volumetric Exploration and Representation Learning via Barycentric Spanner; Algorithm 2 Robust Spanner: Barycentric Spanner via Approximate Linear Optimization; Algorithm 3 PSDP: Policy Search by Dynamic Programming; Algorithm 4 Est Vec: Estimate Eπ[F(xh,ah)] for policy π and function F; Algorithm 5 Rep Learn: Representation Learning for Low-Rank MDPs; Algorithm 6 Rep Learn: Representation Learning for Low-Rank MDPs; Algorithm 7 Est Vec: Estimate Eπ[F(xh,ah)] for given policy π and function F.
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository for their methodology.
Open Datasets No The paper is theoretical and focuses on algorithm design and theoretical guarantees for Low-Rank MDPs. It does not mention using specific datasets for empirical training or provide access information for any dataset.
Dataset Splits No The paper is theoretical and does not conduct experiments with specific datasets, therefore it does not mention training, validation, or test splits.
Hardware Specification No The paper is theoretical and focuses on algorithm design and analysis. It does not mention any specific hardware used for experiments.
Software Dependencies No The paper is theoretical and focuses on algorithm design and analysis. It mentions computational methods like "standard gradient-based optimization techniques" but does not specify any software names with version numbers.
Experiment Setup No The paper specifies theoretical parameters like ε, c, and n for its algorithms and their complexity analysis, but these are not concrete experimental setup details such as learning rates, batch sizes, or optimizer settings for empirical evaluation.