Do Differentiable Simulators Give Better Policy Gradients?
Authors: Hyung Ju Suh, Max Simchowitz, Kaiqing Zhang, Russ Tedrake
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the pitfalls of traditional estimators and the advantages of the α-order estimator on some numerical examples. and To validate our results on policy optimization problems with differentiable simulators, we compare the performance of different gradients on time-stepping simulations written in torch (Paszke et al., 2019). |
| Researcher Affiliation | Academia | 1Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, USA. Correspondence to: H.J.Terry Suh <hjsuh@mit.edu>. |
| Pseudocode | No | The paper does not contain any explicit pseudocode blocks or sections labeled 'Algorithm'. |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper refers to custom simulation environments (e.g., 'Ball with wall example', 'pushing example', 'tennis example') but does not provide concrete access information (links, DOIs, repositories, or formal citations) for any publicly available datasets used for training or evaluation. |
| Dataset Splits | No | The paper does not specify exact dataset split percentages or sample counts for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models, memory, or cloud instance types used for running its experiments. |
| Software Dependencies | No | The paper mentions 'time-stepping simulations written in torch (Paszke et al., 2019)' but does not specify a version number for torch or any other software dependencies. |
| Experiment Setup | Yes | We use horizon of H = 200 to find the optimal force sequence of the first block to minimize distance between the second block and the goal position. and We use a linear feedback policy with d = 21 parameters, and horizon of H = 200. |