reproducibilityindex.ai

Rich-Observation Reinforcement Learning with Continuous Latent Dynamics

Authors: Yuda Song, Lili Wu, Dylan J Foster, Akshay Krishnamurthy

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our objective is amenable to practical implementation, and empirically, it compares favorably to prior schemes in a standard evaluation protocol. We further provide several insights into the statistical complexity of the Rich CLD framework, in particular proving that certain notions of Lipschitzness that admit sample-efficient learning in the absence of rich observations are insufficient in the rich-observation setting.
Researcher Affiliation	Collaboration	1Carnegie Mellon University 2Microsoft Research.
Pseudocode	Yes	Algorithm 1 BCRL.C: Bellman Consistent Representation Learning with Continuous Latent Dynamics; Algorithm 2 CRIEE: Continuous Representation Learning with Interleaved Explore-Exploit; Algorithm 3 Opt DP: Optimistic Dynamic Programming; Algorithm 4 Iter-BCRL.C
Open Source Code	No	The paper does not provide a direct link or explicit statement about the availability of its source code within the provided PDF content.
Open Datasets	Yes	We consider a maze environment (Koul et al., 2023) and a locomotion benchmark (Lu et al., 2023), both with visual (rich) observations. ... The datasets that we use can be downloaded in data source: cheetah run medium and data source: walker walk medium.
Dataset Splits	No	The paper describes data collection for the maze environment and refers to using 'offline data' from D4RL for locomotion, but it does not explicitly provide percentages, sample counts, or predefined splits for training, validation, and testing of these datasets within its experimental setup.
Hardware Specification	No	The paper does not specify any particular hardware components such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions using deep neural networks but does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup	Yes	We use deep neural networks to parameterize the decoders ϕ Φ, the discriminators f F, and the prediction heads g Lip; architecture details are given in Appendix C. ... See Appendix C for hyperparameter settings and additional details. (Appendix C includes Table 1: Hyperparameters for Maze and Table 2: Hyperparameters for Locomotion Environments, listing specific values like batch size, latent dimension, and network architecture details).