Toward Evaluating Robustness of Deep Reinforcement Learning with Continuous Control
Authors: Tsui-Wei Weng, Krishnamurthy (Dj) Dvijotham*, Jonathan Uesato*, Kai Xiao*, Sven Gowal*, Robert Stanforth*, Pushmeet Kohli
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on various Mu Jo Co domains (Cartpole, Fish, Walker, Humanoid) demonstrate that our proposed framework is much more effective and efficient than model-free attacks baselines in degrading agent performance as well as driving agents to unsafe states. |
| Researcher Affiliation | Collaboration | 1MIT, 2Deep Mind |
| Pseudocode | Yes | Our proposed attack is summarized in Algorithm 2 for Step 1, and Algorithm 3 for Step 2. Algorithm 1 Collect trajectories |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide links to a code repository. |
| Open Datasets | Yes | In this section, we conduct experiments on standard reinforcement learning environment for continuous control (Tassa et al., 2018). We demonstrate results on 4 different environments in Mu-Jo Co Tassa et al. (2018) and corresponding tasks: Cartpole-balance/swingup, Fish-upright, Walker-stand/walk and Humanoid-stand/walk. |
| Dataset Splits | No | The paper mentions “training and test losses” for the dynamics model but does not provide specific percentages or counts for distinct training, validation, and test splits. The D4PG agent used is described as “pre-trained”. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as CPU models, GPU models, or cloud computing instance types. |
| Software Dependencies | No | The paper mentions using “Adam as the optimizer” and experiments on “Mu Jo Co” domains, but it does not specify version numbers for any software, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | A 4-layer feed-forward neural network with 1000 hidden neurons per layer is trained as the dynamics model f respectively for the domains of Cartpole, Fish, Walker and Humanoid. We use Adam as the optimizer with optimization steps equal to 30 and we report the best result for each run from a combination of 6 learning rates, 2 unroll length {T1, T2} and n steps of applying PGD solution with n Ti. Specifically, for the Cartpole and Fish, we found that 1000 episodes (1e6 training points) are sufficient to train a good dynamics model... while for the more complicated domain like Walker and Humanoid, more training points (5e6) are required... For the deep RL agent, we train a state-of-the-art D4PG agent (Barth-Maron et al., 2018) with default Gaussian noise N(0, 0.3I) on the action and the score of the agents without attacks is summarized in Appendix A.3. |