reproducibilityindex.ai

Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective

Authors: Raj Ghugare, Homanga Bharadhwaj, Benjamin Eysenbach, Sergey Levine, Russ Salakhutdinov

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that the resulting algorithm matches or improves the sample-efﬁciency of the best prior model-based and model-free RL methods.
Researcher Affiliation	Academia	VNIT Nagpur 2Carnegie Mellon University 3UC Berkeley
Pseudocode	Yes	Algorithm 1 The ALM objective can be optimized with any RL algorithm. We present an implementation based on DDPG (Lillicrap et al., 2015).
Open Source Code	Yes	Project website with code: https://alignedlatentmodels.github.io/
Open Datasets	Yes	We start by comparing ALM with the baselines on the locomotion benchmark proposed by Wang et al. (2019).
Dataset Splits	No	The paper mentions 'validation' in the context of learned Q-functions ('validation' in Appendix A.6 refers to a Q-function update), but not as explicit dataset splits for the environments used.
Hardware Specification	No	The paper acknowledges assistance in 'setting up the compute necessary for running the experiments' but does not provide specific details on the hardware used (e.g., CPU, GPU models, memory).
Software Dependencies	No	The paper mentions various algorithms and network components (e.g., DDPG, SAC-SVG, layer normalization, Relu/Elu activations) but does not list specific software dependencies with version numbers (e.g., PyTorch, TensorFlow, Python versions).
Experiment Setup	Yes	Table 3: A default set of hyper-parameters used in our experiments. Hyperparameters Value Discount (γ) 0.99 Warmup steps 5000 Soft update rate (τ) 0.005 Weighted target parameter (λ) 0.95 Replay Buffer 10^6 for humanoid 10^5 otherwise Batch size 512 Learning rate 1e-4 Max grad norm 100.0 Latent dimension 50 Coefﬁcient of classiﬁer rewards 0.1 Exploration stddev. clip 0.3 Exploration stddev. schedule linear(1.0 , 0.1, 100000)