Robust Adversarial Reinforcement Learning with Dissipation Inequation Constraint
Authors: Peng Zhai, Jie Luo, Zhiyan Dong, Lihua Zhang, Shunli Wang, Dingkang Yang5431-5439
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Results of experiments in Mu Jo Co and Gym Fc environments show that our architecture effectively improves the robustness of the controller against environmental changes and adapts to more powerful adversaries. Results of the flight experiments on a real quadcopter indicate that our method can directly deploy the policy trained in the simulation environment to the real environment, and our controller outperforms the PID controller based on hardware-in-the-loop. |
| Researcher Affiliation | Academia | Peng Zhai1,4, Jie Luo1,5, Zhiyan Dong1,2,5*, Lihua Zhang1,3,4*, Shunli Wang1,3, Dingkang Yang1,4 1 Academy for Engineering and Technology, Fudan University 2 Ji Hua Laboratory, Foshan, China 3 Engineering Research Center of AI and Robotics, Ministry of Education, Shanghai 200433, China 4 Jilin Provincial Key Laboratory of Intelligence Science and Engineering, Changchun, China 5 Engineering Research Center of AI and Robotics, Shanghai, China {pzhai18, 19210860032, dongzhiyan, lihuazhang, slwang19, dkyang20}@fudan.edu.cn |
| Pseudocode | Yes | Algorithm 1: DICARL (proposed algorithm) |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the methodology described is open-source or publicly available. |
| Open Datasets | No | The paper mentions using simulation environments like "Mu Jo Co" and "Gym Fc" for training. While data is generated within these environments during training, they are platforms for interaction, not pre-existing publicly available datasets in the traditional sense (e.g., CIFAR-10, MNIST) for which concrete access information (link, citation) is provided. Therefore, it does not meet the strict criteria for providing access to a publicly available dataset. |
| Dataset Splits | No | The paper discusses "training domain" and "test domain" and mentions evaluating over "700 episodes" for testing, but it does not specify explicit training/validation/test dataset splits (e.g., percentages, sample counts, or specific predefined split citations) that would be needed to reproduce the data partitioning. |
| Hardware Specification | Yes | All our experiments were run on a desktop computer equipped with an Intel Core i7- |
| Software Dependencies | No | The paper states: "We used the PPO algorithm implemented by open AI baselines (Dhariwal et al. 2017) as the policy optimizer for all the algorithms". While it mentions OpenAI Baselines, it does not provide specific version numbers for this or any other software dependencies (e.g., Python, PyTorch, TensorFlow) needed for replication. |
| Experiment Setup | Yes | All the shared hyper-parameters were the same, and we run all algorithms with the same amount of simulation steps. The detailed settings of the hyper-parameter of each algorithm are reported in Appendix C.2. |