Rich-Observation Reinforcement Learning with Continuous Latent Dynamics
Authors: Yuda Song, Lili Wu, Dylan J Foster, Akshay Krishnamurthy
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our objective is amenable to practical implementation, and empirically, it compares favorably to prior schemes in a standard evaluation protocol. We further provide several insights into the statistical complexity of the Rich CLD framework, in particular proving that certain notions of Lipschitzness that admit sample-efficient learning in the absence of rich observations are insufficient in the rich-observation setting. |
| Researcher Affiliation | Collaboration | 1Carnegie Mellon University 2Microsoft Research. |
| Pseudocode | Yes | Algorithm 1 BCRL.C: Bellman Consistent Representation Learning with Continuous Latent Dynamics; Algorithm 2 CRIEE: Continuous Representation Learning with Interleaved Explore-Exploit; Algorithm 3 Opt DP: Optimistic Dynamic Programming; Algorithm 4 Iter-BCRL.C |
| Open Source Code | No | The paper does not provide a direct link or explicit statement about the availability of its source code within the provided PDF content. |
| Open Datasets | Yes | We consider a maze environment (Koul et al., 2023) and a locomotion benchmark (Lu et al., 2023), both with visual (rich) observations. ... The datasets that we use can be downloaded in data source: cheetah run medium and data source: walker walk medium. |
| Dataset Splits | No | The paper describes data collection for the maze environment and refers to using 'offline data' from D4RL for locomotion, but it does not explicitly provide percentages, sample counts, or predefined splits for training, validation, and testing of these datasets within its experimental setup. |
| Hardware Specification | No | The paper does not specify any particular hardware components such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using deep neural networks but does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used. |
| Experiment Setup | Yes | We use deep neural networks to parameterize the decoders ϕ Φ, the discriminators f F, and the prediction heads g Lip; architecture details are given in Appendix C. ... See Appendix C for hyperparameter settings and additional details. (Appendix C includes Table 1: Hyperparameters for Maze and Table 2: Hyperparameters for Locomotion Environments, listing specific values like batch size, latent dimension, and network architecture details). |