Task-Agnostic Dynamics Priors for Deep Reinforcement Learning

Authors: Yilun Du, Karthic Narasimhan

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform two empirical studies to evaluate our hypothesis. First, we evaluate various frame prediction models, including our proposed Spatial Net, in terms of their capacity to predict future states and model physical interactions (Sections 4.1 and 4.2). Then, we investigate the use of these dynamics predictors for policy learning in different environments (Section 4.3).
Researcher Affiliation Collaboration 1Massachusetts Institute of Technology (Work partially done at Open AI) 2Princeton University.
Pseudocode No The paper describes the architecture of Spatial Net in Section 3.2 and presents a diagram in Figure 2, but does not provide formal pseudocode or algorithm blocks.
Open Source Code No No statement about making the source code publicly available or providing a link to a code repository was found.
Open Datasets Yes Finally, we also evaluate on a stochastic variant of the popular ALE framework consisting of Atari games (Machado et al., 2017a).
Dataset Splits Yes We generate 5000 different trajectories in total 4500 for training a dynamics predictor and 500 for testing with each trajectory having a length of 125 steps.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments were provided in the paper.
Software Dependencies No The paper mentions 'Pymunk' and 'Bullet' (Coumans, 2010) as tools used, but does not provide specific version numbers for these or any other software dependencies like libraries or frameworks.
Experiment Setup Yes We use the Adam optimizer (Kingma and Ba, 2015) in our experiments with a learning rate of 10 4. ... We use the Adam optimizer with learning rate 10 4 to train model predictions and the same set of hyper-parameters for training all policy agents as those used in (Schulman et al., 2017).