Estimating Q(s,s’) with Deep Deterministic Dynamics Gradients

Authors: Ashley Edwards, Himanshu Sahni, Rosanne Liu, Jane Hung, Ankit Jain, Rui Wang, Adrien Ecoffet, Thomas Miconi, Charles Isbell, Jason Yosinski

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate in both tabular problems and Mu Jo Co tasks (Todorov et al., 2012). Figure 7. Experiments for training TD3, DDPG, D3G and D3G in Mu Jo Co tasks. Every 5000 timesteps, we evaluated the learned policy and averaged the return over 10 trials. The experiments were averaged over 10 seeds with 95% confidence intervals.
Researcher Affiliation Collaboration 1Uber AI Labs 2Georgia Institute of Technology, Atlanta, GA, USA 3ML Collective.
Pseudocode Yes Algorithm 1 D3G algorithm. Algorithm 2 Cycle.
Open Source Code Yes Code and videos are available at http:// sites.google.com/view/qss-paper.
Open Datasets Yes We evaluate in both tabular problems and Mu Jo Co tasks (Todorov et al., 2012). We next evaluate D3G in more complicated Mu Jo Co tasks from Open AI Gym (Brockman et al., 2016).
Dataset Splits No The paper discusses evaluating learned policies and averaging returns over trials and seeds, but it does not provide specific training, validation, and test dataset splits (e.g., percentages or sample counts).
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory) used for running its experiments.
Software Dependencies No The paper mentions software environments like MuJoCo and Open AI Gym, but it does not provide specific version numbers for these or any other software dependencies.
Experiment Setup No The paper states, 'We include full training details of hyperparameters and architectures in the appendix.' This indicates that the specific experimental setup details are not provided in the main text.