Task-Agnostic Dynamics Priors for Deep Reinforcement Learning
Authors: Yilun Du, Karthic Narasimhan
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform two empirical studies to evaluate our hypothesis. First, we evaluate various frame prediction models, including our proposed Spatial Net, in terms of their capacity to predict future states and model physical interactions (Sections 4.1 and 4.2). Then, we investigate the use of these dynamics predictors for policy learning in different environments (Section 4.3). |
| Researcher Affiliation | Collaboration | 1Massachusetts Institute of Technology (Work partially done at Open AI) 2Princeton University. |
| Pseudocode | No | The paper describes the architecture of Spatial Net in Section 3.2 and presents a diagram in Figure 2, but does not provide formal pseudocode or algorithm blocks. |
| Open Source Code | No | No statement about making the source code publicly available or providing a link to a code repository was found. |
| Open Datasets | Yes | Finally, we also evaluate on a stochastic variant of the popular ALE framework consisting of Atari games (Machado et al., 2017a). |
| Dataset Splits | Yes | We generate 5000 different trajectories in total 4500 for training a dynamics predictor and 500 for testing with each trajectory having a length of 125 steps. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments were provided in the paper. |
| Software Dependencies | No | The paper mentions 'Pymunk' and 'Bullet' (Coumans, 2010) as tools used, but does not provide specific version numbers for these or any other software dependencies like libraries or frameworks. |
| Experiment Setup | Yes | We use the Adam optimizer (Kingma and Ba, 2015) in our experiments with a learning rate of 10 4. ... We use the Adam optimizer with learning rate 10 4 to train model predictions and the same set of hyper-parameters for training all policy agents as those used in (Schulman et al., 2017). |