Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Foresee then Evaluate: Decomposing Value Estimation with Latent Future Prediction
Authors: Hongyao Tang, Zhaopeng Meng, Guangyong Chen, Pengfei Chen, Chen Chen, Yaodong Yang, Luo Zhang, Wulong Liu, Jianye Hao9834-9842
AAAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct our experiments on serveral representative continuous control tasks in Open AI gym (Brockman et al. 2016; Todorov, Erez, and Tassa 2012). We compare our derived approach VD-DDPG with DDPG (Lillicrap et al. 2015), PPO (Schulman et al. 2017), and A2C (Mnih et al. 2016), as well as the Deterministic DSR (DDSR)... The learning curves of algorithms are shown in Figure 2. We can observe that VD-DDPG matches or outperforms other algorithms in both ο¬nal performance and learning speed across all four tasks. |
| Researcher Affiliation | Collaboration | Hongyao Tang,1 Zhaopeng Meng,1 Guangyong Chen,3 Pengfei Chen,4 Chen Chen,2 Yaodong Yang,2 Luo Zhang,1 Wulong Liu,2 Jianye Hao1,2 1College of Intelligence and Computing, Tianjin University, 2Noah s Ark Lab, Huawei, 3Shenzhen Institutes of Advanced Technology, Chinese Academy of Hong Kong EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | The complete algorithm of VD-DDPG can be found in Supplementary Material C.1 Algorithm 1. |
| Open Source Code | Yes | Source codes are available at https://github.com/bluecontra/AAAI2021-VDFP. |
| Open Datasets | Yes | We conduct our experiments on serveral representative continuous control tasks in Open AI gym (Brockman et al. 2016; Todorov, Erez, and Tassa 2012) |
| Dataset Splits | No | The paper states that experiments are conducted on OpenAI Gym tasks for '1 million timesteps' and results are reported over '5 random seeds'. However, it does not provide specific train/validation/test dataset splits (e.g., percentages or sample counts) for the generated experience data. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions software components like 'Open AI gym' and 'Adam Optimizer' but does not specify their version numbers or the versions of other key software dependencies (e.g., PyTorch, TensorFlow, Python) required for reproducibility. |
| Experiment Setup | Yes | For VD-DDPG, we set the KL weight Ξ² = 1000 as in (Burgess et al. 2018) and the clip value c as 0.2. A max trajectory length of 256 is used expect using 64 already ensures a good performance for Half Cheetah-v1. An exploration noise sampled from N(0, 0.1) (Fujimoto, v. Hoof, and Meger 2018) is added to each action selected by the deterministic policy of DDPG, DDSR and VD-DDPG. The discounted factor is 0.99 and we use Adam Optimizer (Kingma and Ba 2015) for all algorithms. |