reproducibilityindex.ai

Estimating Q(s,s’) with Deep Deterministic Dynamics Gradients

Authors: Ashley Edwards, Himanshu Sahni, Rosanne Liu, Jane Hung, Ankit Jain, Rui Wang, Adrien Ecoffet, Thomas Miconi, Charles Isbell, Jason Yosinski

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate in both tabular problems and Mu Jo Co tasks (Todorov et al., 2012). Figure 7. Experiments for training TD3, DDPG, D3G and D3G in Mu Jo Co tasks. Every 5000 timesteps, we evaluated the learned policy and averaged the return over 10 trials. The experiments were averaged over 10 seeds with 95% conﬁdence intervals.
Researcher Affiliation	Collaboration	1Uber AI Labs 2Georgia Institute of Technology, Atlanta, GA, USA 3ML Collective.
Pseudocode	Yes	Algorithm 1 D3G algorithm. Algorithm 2 Cycle.
Open Source Code	Yes	Code and videos are available at http:// sites.google.com/view/qss-paper.
Open Datasets	Yes	We evaluate in both tabular problems and Mu Jo Co tasks (Todorov et al., 2012). We next evaluate D3G in more complicated Mu Jo Co tasks from Open AI Gym (Brockman et al., 2016).
Dataset Splits	No	The paper discusses evaluating learned policies and averaging returns over trials and seeds, but it does not provide specific training, validation, and test dataset splits (e.g., percentages or sample counts).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory) used for running its experiments.
Software Dependencies	No	The paper mentions software environments like MuJoCo and Open AI Gym, but it does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	No	The paper states, 'We include full training details of hyperparameters and architectures in the appendix.' This indicates that the specific experimental setup details are not provided in the main text.