Accelerated Policy Learning with Parallel Differentiable Simulation
Authors: Jie Xu, Viktor Makoviychuk, Yashraj Narang, Fabio Ramos, Wojciech Matusik, Animesh Garg, Miles Macklin
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on classical RL control tasks, and show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms. In addition, we demonstrate the scalability of our method by applying it to the challenging high-dimensional problem of muscle-actuated locomotion with a large action space, achieving a greater than 17 reduction in training time over the best-performing established RL algorithm. |
| Researcher Affiliation | Collaboration | Jie Xu1,2, Viktor Makoviychuk1, Yashraj Narang1, Fabio Ramos1,3, Wojciech Matusik2, Animesh Garg1,4, Miles Macklin1 1NVIDIA 2Massachusetts Institute of Technology 3University of Sydney 4University of Toronto |
| Pseudocode | Yes | The pseudo code of our method is provided in Algorithm 1. |
| Open Source Code | Yes | For other problems (i.e., Half Cheetah and Hopper), refer to our released code for the adopted hyperparameters. |
| Open Datasets | Yes | We select Cart Pole Swing Up, Ant and Humanoid as three representative RL tasks... We use the lower body of the humanoid model from Lee et al. (2019)... |
| Dataset Splits | No | The paper mentions hyperparameter searches, which implies a validation process. However, it does not provide explicit training/validation/test dataset splits with specific percentages or counts for its simulated environments, as these are continuous reinforcement learning tasks rather than fixed datasets. |
| Hardware Specification | Yes | To ensure a fair comparison for wall-clock time performance, we run all algorithms on the same GPU model (TITAN X) and CPU model (Intel Xeon(R) E5-2620). ... The performance is measured on a desktop with GPU model TITAN X and CPU model Intel Xeon(R) E5-2620 @ 2.10GHz. |
| Software Dependencies | No | We build our differentiable simulator on Py Torch (Paszke et al., 2019) and use high-performance implementations from RL games (Makoviichuk & Makoviychuk, 2021). While software components are mentioned, specific version numbers for PyTorch or RL games are not provided. |
| Experiment Setup | Yes | We conduct an extensive hyperparameter search for all algorithms... The hyperparameters of PPO and SAC we used in the experiments are reported in Table 2 and 3. For our method... we report the hyperparameters for each problem in Table 4. |