Policy Search by Target Distribution Learning for Continuous Control

Authors: Chuheng Zhang, Yuanqi Li, Jian Li6770-6777

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that TDL algorithms perform comparably to (or better than) state-of-the-art algorithms for most continuous control tasks in the Mu Jo Co environment while being more stable in training.
Researcher Affiliation Academia Chuheng Zhang IIIS, Tsinghua University zhangchuheng123@live.com Yuanqi Li IIIS, Tsinghua University timezerolyq@gmail.com Jian Li IIIS, Tsinghua University lapordge@gmail.com
Pseudocode Yes Algorithm 1 Target learning
Open Source Code No The paper does not provide an explicit statement or link to its open-source code.
Open Datasets Yes We implemented TDL-direct, TDL-ES and TDL-ESr for the continuous control tasks provided by Open AI Gym (Brockman et al. 2016) using Mu Jo Co simulator (Todorov, Erez, and Tassa 2012).
Dataset Splits No The paper mentions using "minibatch updates" and "sample reuse" but does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts).
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions "Open AI Gym" and "Mu Jo Co simulator" but does not specify version numbers for these software components.
Experiment Setup Yes Due to space limit, the detailed setting of hyperparameters can be found in Appendix G.