Foresee then Evaluate: Decomposing Value Estimation with Latent Future Prediction

Authors: Hongyao Tang, Zhaopeng Meng, Guangyong Chen, Pengfei Chen, Chen Chen, Yaodong Yang, Luo Zhang, Wulong Liu, Jianye Hao9834-9842

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct our experiments on serveral representative continuous control tasks in Open AI gym (Brockman et al. 2016; Todorov, Erez, and Tassa 2012). We compare our derived approach VD-DDPG with DDPG (Lillicrap et al. 2015), PPO (Schulman et al. 2017), and A2C (Mnih et al. 2016), as well as the Deterministic DSR (DDSR)... The learning curves of algorithms are shown in Figure 2. We can observe that VD-DDPG matches or outperforms other algorithms in both final performance and learning speed across all four tasks.
Researcher Affiliation Collaboration Hongyao Tang,1 Zhaopeng Meng,1 Guangyong Chen,3 Pengfei Chen,4 Chen Chen,2 Yaodong Yang,2 Luo Zhang,1 Wulong Liu,2 Jianye Hao1,2 1College of Intelligence and Computing, Tianjin University, 2Noah s Ark Lab, Huawei, 3Shenzhen Institutes of Advanced Technology, Chinese Academy of Hong Kong {bluecontra,mengzp,luozhang}@tju.edu.cn, gy.chen@siat.ac.cn, pfchen@cse.cuhk.edu.hk, {chenchen9,yang.yaodong,liuwulong,haojianye}@huawei.com
Pseudocode Yes The complete algorithm of VD-DDPG can be found in Supplementary Material C.1 Algorithm 1.
Open Source Code Yes Source codes are available at https://github.com/bluecontra/AAAI2021-VDFP.
Open Datasets Yes We conduct our experiments on serveral representative continuous control tasks in Open AI gym (Brockman et al. 2016; Todorov, Erez, and Tassa 2012)
Dataset Splits No The paper states that experiments are conducted on OpenAI Gym tasks for '1 million timesteps' and results are reported over '5 random seeds'. However, it does not provide specific train/validation/test dataset splits (e.g., percentages or sample counts) for the generated experience data.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies No The paper mentions software components like 'Open AI gym' and 'Adam Optimizer' but does not specify their version numbers or the versions of other key software dependencies (e.g., PyTorch, TensorFlow, Python) required for reproducibility.
Experiment Setup Yes For VD-DDPG, we set the KL weight β = 1000 as in (Burgess et al. 2018) and the clip value c as 0.2. A max trajectory length of 256 is used expect using 64 already ensures a good performance for Half Cheetah-v1. An exploration noise sampled from N(0, 0.1) (Fujimoto, v. Hoof, and Meger 2018) is added to each action selected by the deterministic policy of DDPG, DDSR and VD-DDPG. The discounted factor is 0.99 and we use Adam Optimizer (Kingma and Ba 2015) for all algorithms.