reproducibilityindex.ai

Foresee then Evaluate: Decomposing Value Estimation with Latent Future Prediction

Authors: Hongyao Tang, Zhaopeng Meng, Guangyong Chen, Pengfei Chen, Chen Chen, Yaodong Yang, Luo Zhang, Wulong Liu, Jianye Hao9834-9842

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct our experiments on serveral representative continuous control tasks in Open AI gym (Brockman et al. 2016; Todorov, Erez, and Tassa 2012). We compare our derived approach VD-DDPG with DDPG (Lillicrap et al. 2015), PPO (Schulman et al. 2017), and A2C (Mnih et al. 2016), as well as the Deterministic DSR (DDSR)... The learning curves of algorithms are shown in Figure 2. We can observe that VD-DDPG matches or outperforms other algorithms in both ﬁnal performance and learning speed across all four tasks.
Researcher Affiliation	Collaboration	Hongyao Tang,1 Zhaopeng Meng,1 Guangyong Chen,3 Pengfei Chen,4 Chen Chen,2 Yaodong Yang,2 Luo Zhang,1 Wulong Liu,2 Jianye Hao1,2 1College of Intelligence and Computing, Tianjin University, 2Noah s Ark Lab, Huawei, 3Shenzhen Institutes of Advanced Technology, Chinese Academy of Hong Kong {bluecontra,mengzp,luozhang}@tju.edu.cn, gy.chen@siat.ac.cn, pfchen@cse.cuhk.edu.hk, {chenchen9,yang.yaodong,liuwulong,haojianye}@huawei.com
Pseudocode	Yes	The complete algorithm of VD-DDPG can be found in Supplementary Material C.1 Algorithm 1.
Open Source Code	Yes	Source codes are available at https://github.com/bluecontra/AAAI2021-VDFP.
Open Datasets	Yes	We conduct our experiments on serveral representative continuous control tasks in Open AI gym (Brockman et al. 2016; Todorov, Erez, and Tassa 2012)
Dataset Splits	No	The paper states that experiments are conducted on OpenAI Gym tasks for '1 million timesteps' and results are reported over '5 random seeds'. However, it does not provide specific train/validation/test dataset splits (e.g., percentages or sample counts) for the generated experience data.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies	No	The paper mentions software components like 'Open AI gym' and 'Adam Optimizer' but does not specify their version numbers or the versions of other key software dependencies (e.g., PyTorch, TensorFlow, Python) required for reproducibility.
Experiment Setup	Yes	For VD-DDPG, we set the KL weight β = 1000 as in (Burgess et al. 2018) and the clip value c as 0.2. A max trajectory length of 256 is used expect using 64 already ensures a good performance for Half Cheetah-v1. An exploration noise sampled from N(0, 0.1) (Fujimoto, v. Hoof, and Meger 2018) is added to each action selected by the deterministic policy of DDPG, DDSR and VD-DDPG. The discounted factor is 0.99 and we use Adam Optimizer (Kingma and Ba 2015) for all algorithms.