Robust Adversarial Reinforcement Learning with Dissipation Inequation Constraint

Authors: Peng Zhai, Jie Luo, Zhiyan Dong, Lihua Zhang, Shunli Wang, Dingkang Yang5431-5439

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Results of experiments in Mu Jo Co and Gym Fc environments show that our architecture effectively improves the robustness of the controller against environmental changes and adapts to more powerful adversaries. Results of the flight experiments on a real quadcopter indicate that our method can directly deploy the policy trained in the simulation environment to the real environment, and our controller outperforms the PID controller based on hardware-in-the-loop.
Researcher Affiliation Academia Peng Zhai1,4, Jie Luo1,5, Zhiyan Dong1,2,5*, Lihua Zhang1,3,4*, Shunli Wang1,3, Dingkang Yang1,4 1 Academy for Engineering and Technology, Fudan University 2 Ji Hua Laboratory, Foshan, China 3 Engineering Research Center of AI and Robotics, Ministry of Education, Shanghai 200433, China 4 Jilin Provincial Key Laboratory of Intelligence Science and Engineering, Changchun, China 5 Engineering Research Center of AI and Robotics, Shanghai, China {pzhai18, 19210860032, dongzhiyan, lihuazhang, slwang19, dkyang20}@fudan.edu.cn
Pseudocode Yes Algorithm 1: DICARL (proposed algorithm)
Open Source Code No The paper does not provide any explicit statement or link indicating that the source code for the methodology described is open-source or publicly available.
Open Datasets No The paper mentions using simulation environments like "Mu Jo Co" and "Gym Fc" for training. While data is generated within these environments during training, they are platforms for interaction, not pre-existing publicly available datasets in the traditional sense (e.g., CIFAR-10, MNIST) for which concrete access information (link, citation) is provided. Therefore, it does not meet the strict criteria for providing access to a publicly available dataset.
Dataset Splits No The paper discusses "training domain" and "test domain" and mentions evaluating over "700 episodes" for testing, but it does not specify explicit training/validation/test dataset splits (e.g., percentages, sample counts, or specific predefined split citations) that would be needed to reproduce the data partitioning.
Hardware Specification Yes All our experiments were run on a desktop computer equipped with an Intel Core i7-
Software Dependencies No The paper states: "We used the PPO algorithm implemented by open AI baselines (Dhariwal et al. 2017) as the policy optimizer for all the algorithms". While it mentions OpenAI Baselines, it does not provide specific version numbers for this or any other software dependencies (e.g., Python, PyTorch, TensorFlow) needed for replication.
Experiment Setup Yes All the shared hyper-parameters were the same, and we run all algorithms with the same amount of simulation steps. The detailed settings of the hyper-parameter of each algorithm are reported in Appendix C.2.