reproducibilityindex.ai

Accelerated Policy Learning with Parallel Differentiable Simulation

Authors: Jie Xu, Viktor Makoviychuk, Yashraj Narang, Fabio Ramos, Wojciech Matusik, Animesh Garg, Miles Macklin

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on classical RL control tasks, and show substantial improvements in sample efﬁciency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms. In addition, we demonstrate the scalability of our method by applying it to the challenging high-dimensional problem of muscle-actuated locomotion with a large action space, achieving a greater than 17 reduction in training time over the best-performing established RL algorithm.
Researcher Affiliation	Collaboration	Jie Xu1,2, Viktor Makoviychuk1, Yashraj Narang1, Fabio Ramos1,3, Wojciech Matusik2, Animesh Garg1,4, Miles Macklin1 1NVIDIA 2Massachusetts Institute of Technology 3University of Sydney 4University of Toronto
Pseudocode	Yes	The pseudo code of our method is provided in Algorithm 1.
Open Source Code	Yes	For other problems (i.e., Half Cheetah and Hopper), refer to our released code for the adopted hyperparameters.
Open Datasets	Yes	We select Cart Pole Swing Up, Ant and Humanoid as three representative RL tasks... We use the lower body of the humanoid model from Lee et al. (2019)...
Dataset Splits	No	The paper mentions hyperparameter searches, which implies a validation process. However, it does not provide explicit training/validation/test dataset splits with specific percentages or counts for its simulated environments, as these are continuous reinforcement learning tasks rather than fixed datasets.
Hardware Specification	Yes	To ensure a fair comparison for wall-clock time performance, we run all algorithms on the same GPU model (TITAN X) and CPU model (Intel Xeon(R) E5-2620). ... The performance is measured on a desktop with GPU model TITAN X and CPU model Intel Xeon(R) E5-2620 @ 2.10GHz.
Software Dependencies	No	We build our differentiable simulator on Py Torch (Paszke et al., 2019) and use high-performance implementations from RL games (Makoviichuk & Makoviychuk, 2021). While software components are mentioned, specific version numbers for PyTorch or RL games are not provided.
Experiment Setup	Yes	We conduct an extensive hyperparameter search for all algorithms... The hyperparameters of PPO and SAC we used in the experiments are reported in Table 2 and 3. For our method... we report the hyperparameters for each problem in Table 4.