reproducibilityindex.ai

Dual Policy Distillation

Authors: Kwei-Herng Lai, Daochen Zha, Yuening Li, Xia Hu

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The conducted experiments on several continuous control tasks show that the proposed framework achieves superior performance with a learning-based agent and function approximation without the use of expensive teacher models.
Researcher Affiliation	Academia	Kwei-Herng Lai , Daochen Zha , Yuening Li and Xia Hu Department of Computer Science and Engineering, Texas A&M University {khlai037, daochen.zha, yueningl, xiahu}@tamu.edu
Pseudocode	Yes	Algorithm 1 DPD: dual policy distillation
Open Source Code	Yes	We propose a practical algorithm1 based on our theoretical results. The algorithm uses a disadvantageous policy distillation strategy (...) 1https://github.com/datamllab/dual-policy-distillation
Open Datasets	Yes	The experiments are conducted on several continuous control tasks from Open AI gym3 [Brockman et al., 2016]: Swimmer-v2, Half Cheetah-v2, Walker2d-v2, Humanoid-v2.
Dataset Splits	No	The paper does not explicitly provide details about training/validation/test dataset splits. For continuous control tasks in RL, the concept of a static dataset split for validation is often replaced by ongoing evaluation during training or separate test episodes, but no explicit 'validation set' is mentioned.
Hardware Specification	No	The paper does not explicitly describe the hardware used for running its experiments, such as specific CPU or GPU models.
Software Dependencies	No	The paper mentions that experiments are implemented upon "PPO [Schulman et al., 2017] and DDPG [Lillicrap et al., 2016], which are benchmark RL algorithms implemented in Open AI baselines2." and links to the OpenAI Baselines GitHub. However, it does not specify version numbers for any software components, libraries, or frameworks to ensure reproducibility.
Experiment Setup	No	The paper states: "We follow all the hyper-parameters setting and network structures for our DPD implementation and all the baselines we considered." While this implies hyperparameters were used, it does not explicitly provide the concrete values for these settings (e.g., learning rate, batch size, network architectures) in the main text.