Policy Search by Target Distribution Learning for Continuous Control
Authors: Chuheng Zhang, Yuanqi Li, Jian Li6770-6777
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that TDL algorithms perform comparably to (or better than) state-of-the-art algorithms for most continuous control tasks in the Mu Jo Co environment while being more stable in training. |
| Researcher Affiliation | Academia | Chuheng Zhang IIIS, Tsinghua University zhangchuheng123@live.com Yuanqi Li IIIS, Tsinghua University timezerolyq@gmail.com Jian Li IIIS, Tsinghua University lapordge@gmail.com |
| Pseudocode | Yes | Algorithm 1 Target learning |
| Open Source Code | No | The paper does not provide an explicit statement or link to its open-source code. |
| Open Datasets | Yes | We implemented TDL-direct, TDL-ES and TDL-ESr for the continuous control tasks provided by Open AI Gym (Brockman et al. 2016) using Mu Jo Co simulator (Todorov, Erez, and Tassa 2012). |
| Dataset Splits | No | The paper mentions using "minibatch updates" and "sample reuse" but does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions "Open AI Gym" and "Mu Jo Co simulator" but does not specify version numbers for these software components. |
| Experiment Setup | Yes | Due to space limit, the detailed setting of hyperparameters can be found in Appendix G. |