Neural Dynamic Policies for End-to-End Sensorimotor Learning

Authors: Shikhar Bahl, Mustafa Mukadam, Abhinav Gupta, Deepak Pathak

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate NDPs in imitation as well as reinforcement learning setups. NDPs can utilize highdimensional inputs via demonstrations and learn from weak supervisory signals as well as rewards. In both setups, NDPs exhibit better or comparable performance to state-of-the-art approaches.
Researcher Affiliation Collaboration Shikhar Bahl CMU Mustafa Mukadam FAIR Abhinav Gupta CMU Deepak Pathak CMU
Pseudocode Yes Algorithm 1 Training NDPs for RL
Open Source Code Yes Project video and code are available at: https://shikharbahl.github.io/ neural-dynamic-policies/.
Open Datasets Yes We took existing torque control based environments for Picking and Throwing [17] and modified them to enable joint angle control. [...] To test on quasi-static tasks, we use Pushing, Soccer, Faucet-Opening from the Meta-World [46] task suite
Dataset Splits No The paper mentions 'train' and 'test' splits, but does not explicitly detail a separate 'validation' split or how it was used.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions software like Mujoco [43] and PPO [38], but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We run comparisons on the pushing task, varying the number of basis functions N (in the set {2, 6, 10, 15, 20}), DMP rollout lengths (in set {3, 5, 7, 10, 15}), number of integration steps (in set {15, 25, 35, 45}), as well as different basis functions: Gaussian RBF (standard), ψ defined in Equation (3), a liner map ψ(x) = x, a multiquadric map: ψ(x) = p 1 + (ϵx)2, a inverse quadric map ψ(x) = 1 1+(ϵx)2 , and an inverse multiquadric map: ψ(x) = 1