Foresee then Evaluate: Decomposing Value Estimation with Latent Future Prediction
Authors: Hongyao Tang, Zhaopeng Meng, Guangyong Chen, Pengfei Chen, Chen Chen, Yaodong Yang, Luo Zhang, Wulong Liu, Jianye Hao9834-9842
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct our experiments on serveral representative continuous control tasks in Open AI gym (Brockman et al. 2016; Todorov, Erez, and Tassa 2012). We compare our derived approach VD-DDPG with DDPG (Lillicrap et al. 2015), PPO (Schulman et al. 2017), and A2C (Mnih et al. 2016), as well as the Deterministic DSR (DDSR)... The learning curves of algorithms are shown in Figure 2. We can observe that VD-DDPG matches or outperforms other algorithms in both final performance and learning speed across all four tasks. |
| Researcher Affiliation | Collaboration | Hongyao Tang,1 Zhaopeng Meng,1 Guangyong Chen,3 Pengfei Chen,4 Chen Chen,2 Yaodong Yang,2 Luo Zhang,1 Wulong Liu,2 Jianye Hao1,2 1College of Intelligence and Computing, Tianjin University, 2Noah s Ark Lab, Huawei, 3Shenzhen Institutes of Advanced Technology, Chinese Academy of Hong Kong {bluecontra,mengzp,luozhang}@tju.edu.cn, gy.chen@siat.ac.cn, pfchen@cse.cuhk.edu.hk, {chenchen9,yang.yaodong,liuwulong,haojianye}@huawei.com |
| Pseudocode | Yes | The complete algorithm of VD-DDPG can be found in Supplementary Material C.1 Algorithm 1. |
| Open Source Code | Yes | Source codes are available at https://github.com/bluecontra/AAAI2021-VDFP. |
| Open Datasets | Yes | We conduct our experiments on serveral representative continuous control tasks in Open AI gym (Brockman et al. 2016; Todorov, Erez, and Tassa 2012) |
| Dataset Splits | No | The paper states that experiments are conducted on OpenAI Gym tasks for '1 million timesteps' and results are reported over '5 random seeds'. However, it does not provide specific train/validation/test dataset splits (e.g., percentages or sample counts) for the generated experience data. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions software components like 'Open AI gym' and 'Adam Optimizer' but does not specify their version numbers or the versions of other key software dependencies (e.g., PyTorch, TensorFlow, Python) required for reproducibility. |
| Experiment Setup | Yes | For VD-DDPG, we set the KL weight β = 1000 as in (Burgess et al. 2018) and the clip value c as 0.2. A max trajectory length of 256 is used expect using 64 already ensures a good performance for Half Cheetah-v1. An exploration noise sampled from N(0, 0.1) (Fujimoto, v. Hoof, and Meger 2018) is added to each action selected by the deterministic policy of DDPG, DDSR and VD-DDPG. The discounted factor is 0.99 and we use Adam Optimizer (Kingma and Ba 2015) for all algorithms. |