Neural Dynamic Policies for End-to-End Sensorimotor Learning
Authors: Shikhar Bahl, Mustafa Mukadam, Abhinav Gupta, Deepak Pathak
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate NDPs in imitation as well as reinforcement learning setups. NDPs can utilize highdimensional inputs via demonstrations and learn from weak supervisory signals as well as rewards. In both setups, NDPs exhibit better or comparable performance to state-of-the-art approaches. |
| Researcher Affiliation | Collaboration | Shikhar Bahl CMU Mustafa Mukadam FAIR Abhinav Gupta CMU Deepak Pathak CMU |
| Pseudocode | Yes | Algorithm 1 Training NDPs for RL |
| Open Source Code | Yes | Project video and code are available at: https://shikharbahl.github.io/ neural-dynamic-policies/. |
| Open Datasets | Yes | We took existing torque control based environments for Picking and Throwing [17] and modified them to enable joint angle control. [...] To test on quasi-static tasks, we use Pushing, Soccer, Faucet-Opening from the Meta-World [46] task suite |
| Dataset Splits | No | The paper mentions 'train' and 'test' splits, but does not explicitly detail a separate 'validation' split or how it was used. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions software like Mujoco [43] and PPO [38], but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We run comparisons on the pushing task, varying the number of basis functions N (in the set {2, 6, 10, 15, 20}), DMP rollout lengths (in set {3, 5, 7, 10, 15}), number of integration steps (in set {15, 25, 35, 45}), as well as different basis functions: Gaussian RBF (standard), ψ defined in Equation (3), a liner map ψ(x) = x, a multiquadric map: ψ(x) = p 1 + (ϵx)2, a inverse quadric map ψ(x) = 1 1+(ϵx)2 , and an inverse multiquadric map: ψ(x) = 1 |