Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Reinforcement Learning Under Latent Dynamics: Toward Statistical and Algorithmic Modularity
Authors: Philip Amortila, Dylan J Foster, Nan Jiang, Akshay Krishnamurthy, Zak Mhammedi
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper addresses the question of reinforcement learning under general latent dynamics from a statistical and algorithmic perspective. On the statistical side, our main negative result shows that most well-studied settings for reinforcement learning with function approximation become intractable when composed with rich observations; we complement this with a positive result, identifying latent pushforward coverability as a general condition that enables statistical tractability. Algorithmically, we develop provably efficient observable-to-latent reductions |
| Researcher Affiliation | Collaboration | Philip Amortila EMAIL Dylan J. Foster EMAIL Nan Jiang EMAIL Akshay Krishnamurthy EMAIL Zakaria Mhammedi EMAIL |
| Pseudocode | Yes | Algorithm 1 O2L: Observable-to-Latent Reduction, Algorithm 2 GOLF [JLM21], Algorithm 3 Derandomized Exponential Weights (EXPWEIGHTS.DR), Algorithm 4 Optimistic Self-Predictive Latent Model Estimation (SELFPREDICT.OPT) |
| Open Source Code | No | The paper is a theoretical work focusing on statistical and algorithmic modularity. It does not contain any statements about releasing code, nor does it provide links to code repositories. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments that would require a dataset. It discusses abstract 'MDP classes' rather than specific datasets. |
| Dataset Splits | No | The paper is theoretical and does not involve experimental data splits for training, validation, or testing. |
| Hardware Specification | No | This is a theoretical paper and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | This is a theoretical paper and does not list any software dependencies with specific version numbers relevant to experimental replication. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup including specific hyperparameter values or training configurations. |