reproducibilityindex.ai

Policy Search by Target Distribution Learning for Continuous Control

Authors: Chuheng Zhang, Yuanqi Li, Jian Li6770-6777

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that TDL algorithms perform comparably to (or better than) state-of-the-art algorithms for most continuous control tasks in the Mu Jo Co environment while being more stable in training.
Researcher Affiliation	Academia	Chuheng Zhang IIIS, Tsinghua University zhangchuheng123@live.com Yuanqi Li IIIS, Tsinghua University timezerolyq@gmail.com Jian Li IIIS, Tsinghua University lapordge@gmail.com
Pseudocode	Yes	Algorithm 1 Target learning
Open Source Code	No	The paper does not provide an explicit statement or link to its open-source code.
Open Datasets	Yes	We implemented TDL-direct, TDL-ES and TDL-ESr for the continuous control tasks provided by Open AI Gym (Brockman et al. 2016) using Mu Jo Co simulator (Todorov, Erez, and Tassa 2012).
Dataset Splits	No	The paper mentions using "minibatch updates" and "sample reuse" but does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions "Open AI Gym" and "Mu Jo Co simulator" but does not specify version numbers for these software components.
Experiment Setup	Yes	Due to space limit, the detailed setting of hyperparameters can be found in Appendix G.