Reinforcement Learning Under Latent Dynamics: Toward Statistical and Algorithmic Modularity

Authors: Philip Amortila, Dylan J Foster, Nan Jiang, Akshay Krishnamurthy, Zak Mhammedi

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper addresses the question of reinforcement learning under general latent dynamics from a statistical and algorithmic perspective. On the statistical side, our main negative result shows that most well-studied settings for reinforcement learning with function approximation become intractable when composed with rich observations; we complement this with a positive result, identifying latent pushforward coverability as a general condition that enables statistical tractability. Algorithmically, we develop provably efficient observable-to-latent reductions
Researcher Affiliation Collaboration Philip Amortila philipa4@illinois.edu Dylan J. Foster dylanfoster@microsoft.com Nan Jiang nanjiang@illinois.edu Akshay Krishnamurthy akshaykr@microsoft.com Zakaria Mhammedi mhammedi@google.com
Pseudocode Yes Algorithm 1 O2L: Observable-to-Latent Reduction, Algorithm 2 GOLF [JLM21], Algorithm 3 Derandomized Exponential Weights (EXPWEIGHTS.DR), Algorithm 4 Optimistic Self-Predictive Latent Model Estimation (SELFPREDICT.OPT)
Open Source Code No The paper is a theoretical work focusing on statistical and algorithmic modularity. It does not contain any statements about releasing code, nor does it provide links to code repositories.
Open Datasets No The paper is theoretical and does not conduct experiments that would require a dataset. It discusses abstract 'MDP classes' rather than specific datasets.
Dataset Splits No The paper is theoretical and does not involve experimental data splits for training, validation, or testing.
Hardware Specification No This is a theoretical paper and does not describe any specific hardware used for experiments.
Software Dependencies No This is a theoretical paper and does not list any software dependencies with specific version numbers relevant to experimental replication.
Experiment Setup No The paper is theoretical and does not describe an experimental setup including specific hyperparameter values or training configurations.