Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective
Authors: Raj Ghugare, Homanga Bharadhwaj, Benjamin Eysenbach, Sergey Levine, Russ Salakhutdinov
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods. |
| Researcher Affiliation | Academia | VNIT Nagpur 2Carnegie Mellon University 3UC Berkeley |
| Pseudocode | Yes | Algorithm 1 The ALM objective can be optimized with any RL algorithm. We present an implementation based on DDPG (Lillicrap et al., 2015). |
| Open Source Code | Yes | Project website with code: https://alignedlatentmodels.github.io/ |
| Open Datasets | Yes | We start by comparing ALM with the baselines on the locomotion benchmark proposed by Wang et al. (2019). |
| Dataset Splits | No | The paper mentions 'validation' in the context of learned Q-functions ('validation' in Appendix A.6 refers to a Q-function update), but not as explicit dataset splits for the environments used. |
| Hardware Specification | No | The paper acknowledges assistance in 'setting up the compute necessary for running the experiments' but does not provide specific details on the hardware used (e.g., CPU, GPU models, memory). |
| Software Dependencies | No | The paper mentions various algorithms and network components (e.g., DDPG, SAC-SVG, layer normalization, Relu/Elu activations) but does not list specific software dependencies with version numbers (e.g., PyTorch, TensorFlow, Python versions). |
| Experiment Setup | Yes | Table 3: A default set of hyper-parameters used in our experiments. Hyperparameters Value Discount (γ) 0.99 Warmup steps 5000 Soft update rate (τ) 0.005 Weighted target parameter (λ) 0.95 Replay Buffer 10^6 for humanoid 10^5 otherwise Batch size 512 Learning rate 1e-4 Max grad norm 100.0 Latent dimension 50 Coefficient of classifier rewards 0.1 Exploration stddev. clip 0.3 Exploration stddev. schedule linear(1.0 , 0.1, 100000) |