reproducibilityindex.ai

Toward Evaluating Robustness of Deep Reinforcement Learning with Continuous Control

Authors: Tsui-Wei Weng, Krishnamurthy (Dj) Dvijotham*, Jonathan Uesato*, Kai Xiao*, Sven Gowal*, Robert Stanforth*, Pushmeet Kohli

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on various Mu Jo Co domains (Cartpole, Fish, Walker, Humanoid) demonstrate that our proposed framework is much more effective and efﬁcient than model-free attacks baselines in degrading agent performance as well as driving agents to unsafe states.
Researcher Affiliation	Collaboration	1MIT, 2Deep Mind
Pseudocode	Yes	Our proposed attack is summarized in Algorithm 2 for Step 1, and Algorithm 3 for Step 2. Algorithm 1 Collect trajectories
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide links to a code repository.
Open Datasets	Yes	In this section, we conduct experiments on standard reinforcement learning environment for continuous control (Tassa et al., 2018). We demonstrate results on 4 different environments in Mu-Jo Co Tassa et al. (2018) and corresponding tasks: Cartpole-balance/swingup, Fish-upright, Walker-stand/walk and Humanoid-stand/walk.
Dataset Splits	No	The paper mentions “training and test losses” for the dynamics model but does not provide specific percentages or counts for distinct training, validation, and test splits. The D4PG agent used is described as “pre-trained”.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as CPU models, GPU models, or cloud computing instance types.
Software Dependencies	No	The paper mentions using “Adam as the optimizer” and experiments on “Mu Jo Co” domains, but it does not specify version numbers for any software, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	A 4-layer feed-forward neural network with 1000 hidden neurons per layer is trained as the dynamics model f respectively for the domains of Cartpole, Fish, Walker and Humanoid. We use Adam as the optimizer with optimization steps equal to 30 and we report the best result for each run from a combination of 6 learning rates, 2 unroll length {T1, T2} and n steps of applying PGD solution with n Ti. Specifically, for the Cartpole and Fish, we found that 1000 episodes (1e6 training points) are sufﬁcient to train a good dynamics model... while for the more complicated domain like Walker and Humanoid, more training points (5e6) are required... For the deep RL agent, we train a state-of-the-art D4PG agent (Barth-Maron et al., 2018) with default Gaussian noise N(0, 0.3I) on the action and the score of the agents without attacks is summarized in Appendix A.3.