Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
End-to-End Training of Deep Visuomotor Policies
Authors: Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel
JMLR 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on a range of real-world manipulation tasks that require close coordination between vision and control, such as screwing a cap onto a bottle, and present simulated comparisons to a range of prior policy search methods. |
| Researcher Affiliation | Academia | Sergey Levine EMAIL Chelsea Finn EMAIL Trevor Darrell EMAIL Pieter Abbeel EMAIL Division of Computer Science University of California Berkeley, CA 94720-1776, USA |
| Pseudocode | No | The paper describes the algorithmic steps and equations for the guided policy search, BADMM, and trajectory optimization in sections 3, 4, and Appendix A. However, there is no clearly labeled block titled 'Pseudocode' or 'Algorithm' with structured steps. |
| Open Source Code | No | The paper mentions supplementary videos and the use of the Caffe deep learning library, but does not provide specific links to their own implementation code for the described methodology. It states: 'All of the robotic experiments discussed in this section may be viewed in the corresponding supplementary video, available online: http://rll.berkeley.edu/icra2015gps. A video illustration of the visuomotor policies, discussed in the following sections, is also available: http://sites.google.com/site/visuomotorpolicy.' and 'We used the Caļ¬e deep learning library (Jia et al., 2014) for CNN training.' |
| Open Datasets | Yes | Since the training set is still small (we use 1000 images collected from random arm motions), we initialize the ļ¬lters in the ļ¬rst layer with weights from the model of Szegedy et al. (2014), which is trained on Image Net (Deng et al., 2009) classiļ¬cation. |
| Dataset Splits | No | The paper describes different experimental conditions such as 'training target positions and grasps', 'new target positions not seen during training and, for the hammer, new grasps (spatial test)', and 'training positions with visual distractors (visual test)'. It also mentions 'The policies were trained on four diļ¬erent hole positions, and then tested on four new hole positions to evaluate generalization.' However, it does not provide specific percentages or sample counts for training, validation, and test splits of a single dataset. It refers to 'number of trials per test' in Figure 9, which represents evaluation conditions rather than data partitioning methodology. |
| Hardware Specification | Yes | All of the robotic experiments were conducted on a PR2 robot. The robot was controlled at 20 Hz via direct eļ¬ort control,5 and camera images were recorded using the RGB camera on a Prime Sense Carmine sensor. |
| Software Dependencies | No | The paper mentions 'We used the Caļ¬e deep learning library (Jia et al., 2014) for CNN training.' and 'All of the simulated experiments used the Mu Jo Co simulation package (Todorov et al., 2012)'. However, no specific version numbers for Caffe or MuJoCo are provided. |
| Experiment Setup | Yes | Our CNNs have 92,000 parameters and 7 layers, including a novel spatial feature point transformation... Our visuomotor policy runs at 20 Hz on the robot... The visual processing layers of the network consist of three convolutional layers... The third convolutional layer contains 32 response maps with resolution 109x109. These response maps are passed through a spatial softmax function... The spatial feature points (fcx, fcy) are concatenated with the robotās conļ¬guration and fed into two fully connected layers, each with 40 rectiļ¬ed units, followed by linear connections to the torques... We use a step size of α = 0.1 in all of our experiments... The weights νt are initialized to 0.01... The 2D peg insertion task has 6 state dimensions... Trials were 8 seconds in length and simulated at 100 Hz... The cost function is given by ā(xt, ut) = 1/2wu ||ut||^2 + wpā12(pxt ā p)... The weights were set to wu = 10^-6 and wp = 1. |