Reinforcement Learning Under Latent Dynamics: Toward Statistical and Algorithmic Modularity
Authors: Philip Amortila, Dylan J Foster, Nan Jiang, Akshay Krishnamurthy, Zak Mhammedi
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper addresses the question of reinforcement learning under general latent dynamics from a statistical and algorithmic perspective. On the statistical side, our main negative result shows that most well-studied settings for reinforcement learning with function approximation become intractable when composed with rich observations; we complement this with a positive result, identifying latent pushforward coverability as a general condition that enables statistical tractability. Algorithmically, we develop provably efficient observable-to-latent reductions |
| Researcher Affiliation | Collaboration | Philip Amortila philipa4@illinois.edu Dylan J. Foster dylanfoster@microsoft.com Nan Jiang nanjiang@illinois.edu Akshay Krishnamurthy akshaykr@microsoft.com Zakaria Mhammedi mhammedi@google.com |
| Pseudocode | Yes | Algorithm 1 O2L: Observable-to-Latent Reduction, Algorithm 2 GOLF [JLM21], Algorithm 3 Derandomized Exponential Weights (EXPWEIGHTS.DR), Algorithm 4 Optimistic Self-Predictive Latent Model Estimation (SELFPREDICT.OPT) |
| Open Source Code | No | The paper is a theoretical work focusing on statistical and algorithmic modularity. It does not contain any statements about releasing code, nor does it provide links to code repositories. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments that would require a dataset. It discusses abstract 'MDP classes' rather than specific datasets. |
| Dataset Splits | No | The paper is theoretical and does not involve experimental data splits for training, validation, or testing. |
| Hardware Specification | No | This is a theoretical paper and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | This is a theoretical paper and does not list any software dependencies with specific version numbers relevant to experimental replication. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup including specific hyperparameter values or training configurations. |