Accelerated Policy Learning with Parallel Differentiable Simulation

Authors: Jie Xu, Viktor Makoviychuk, Yashraj Narang, Fabio Ramos, Wojciech Matusik, Animesh Garg, Miles Macklin

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on classical RL control tasks, and show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms. In addition, we demonstrate the scalability of our method by applying it to the challenging high-dimensional problem of muscle-actuated locomotion with a large action space, achieving a greater than 17 reduction in training time over the best-performing established RL algorithm.
Researcher Affiliation Collaboration Jie Xu1,2, Viktor Makoviychuk1, Yashraj Narang1, Fabio Ramos1,3, Wojciech Matusik2, Animesh Garg1,4, Miles Macklin1 1NVIDIA 2Massachusetts Institute of Technology 3University of Sydney 4University of Toronto
Pseudocode Yes The pseudo code of our method is provided in Algorithm 1.
Open Source Code Yes For other problems (i.e., Half Cheetah and Hopper), refer to our released code for the adopted hyperparameters.
Open Datasets Yes We select Cart Pole Swing Up, Ant and Humanoid as three representative RL tasks... We use the lower body of the humanoid model from Lee et al. (2019)...
Dataset Splits No The paper mentions hyperparameter searches, which implies a validation process. However, it does not provide explicit training/validation/test dataset splits with specific percentages or counts for its simulated environments, as these are continuous reinforcement learning tasks rather than fixed datasets.
Hardware Specification Yes To ensure a fair comparison for wall-clock time performance, we run all algorithms on the same GPU model (TITAN X) and CPU model (Intel Xeon(R) E5-2620). ... The performance is measured on a desktop with GPU model TITAN X and CPU model Intel Xeon(R) E5-2620 @ 2.10GHz.
Software Dependencies No We build our differentiable simulator on Py Torch (Paszke et al., 2019) and use high-performance implementations from RL games (Makoviichuk & Makoviychuk, 2021). While software components are mentioned, specific version numbers for PyTorch or RL games are not provided.
Experiment Setup Yes We conduct an extensive hyperparameter search for all algorithms... The hyperparameters of PPO and SAC we used in the experiments are reported in Table 2 and 3. For our method... we report the hyperparameters for each problem in Table 4.