Estimating Q(s,s’) with Deep Deterministic Dynamics Gradients
Authors: Ashley Edwards, Himanshu Sahni, Rosanne Liu, Jane Hung, Ankit Jain, Rui Wang, Adrien Ecoffet, Thomas Miconi, Charles Isbell, Jason Yosinski
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate in both tabular problems and Mu Jo Co tasks (Todorov et al., 2012). Figure 7. Experiments for training TD3, DDPG, D3G and D3G in Mu Jo Co tasks. Every 5000 timesteps, we evaluated the learned policy and averaged the return over 10 trials. The experiments were averaged over 10 seeds with 95% confidence intervals. |
| Researcher Affiliation | Collaboration | 1Uber AI Labs 2Georgia Institute of Technology, Atlanta, GA, USA 3ML Collective. |
| Pseudocode | Yes | Algorithm 1 D3G algorithm. Algorithm 2 Cycle. |
| Open Source Code | Yes | Code and videos are available at http:// sites.google.com/view/qss-paper. |
| Open Datasets | Yes | We evaluate in both tabular problems and Mu Jo Co tasks (Todorov et al., 2012). We next evaluate D3G in more complicated Mu Jo Co tasks from Open AI Gym (Brockman et al., 2016). |
| Dataset Splits | No | The paper discusses evaluating learned policies and averaging returns over trials and seeds, but it does not provide specific training, validation, and test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions software environments like MuJoCo and Open AI Gym, but it does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | No | The paper states, 'We include full training details of hyperparameters and architectures in the appendix.' This indicates that the specific experimental setup details are not provided in the main text. |