reproducibilityindex.ai

Do Differentiable Simulators Give Better Policy Gradients?

Authors: Hyung Ju Suh, Max Simchowitz, Kaiqing Zhang, Russ Tedrake

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the pitfalls of traditional estimators and the advantages of the α-order estimator on some numerical examples. and To validate our results on policy optimization problems with differentiable simulators, we compare the performance of different gradients on time-stepping simulations written in torch (Paszke et al., 2019).
Researcher Affiliation	Academia	1Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, USA. Correspondence to: H.J.Terry Suh <hjsuh@mit.edu>.
Pseudocode	No	The paper does not contain any explicit pseudocode blocks or sections labeled 'Algorithm'.
Open Source Code	No	The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper refers to custom simulation environments (e.g., 'Ball with wall example', 'pushing example', 'tennis example') but does not provide concrete access information (links, DOIs, repositories, or formal citations) for any publicly available datasets used for training or evaluation.
Dataset Splits	No	The paper does not specify exact dataset split percentages or sample counts for training, validation, or testing.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU or CPU models, memory, or cloud instance types used for running its experiments.
Software Dependencies	No	The paper mentions 'time-stepping simulations written in torch (Paszke et al., 2019)' but does not specify a version number for torch or any other software dependencies.
Experiment Setup	Yes	We use horizon of H = 200 to find the optimal force sequence of the first block to minimize distance between the second block and the goal position. and We use a linear feedback policy with d = 21 parameters, and horizon of H = 200.