Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

End-to-End Training of Deep Visuomotor Policies

Authors: Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on a range of real-world manipulation tasks that require close coordination between vision and control, such as screwing a cap onto a bottle, and present simulated comparisons to a range of prior policy search methods.
Researcher Affiliation Academia Sergey Levine EMAIL Chelsea Finn EMAIL Trevor Darrell EMAIL Pieter Abbeel EMAIL Division of Computer Science University of California Berkeley, CA 94720-1776, USA
Pseudocode No The paper describes the algorithmic steps and equations for the guided policy search, BADMM, and trajectory optimization in sections 3, 4, and Appendix A. However, there is no clearly labeled block titled 'Pseudocode' or 'Algorithm' with structured steps.
Open Source Code No The paper mentions supplementary videos and the use of the Caffe deep learning library, but does not provide specific links to their own implementation code for the described methodology. It states: 'All of the robotic experiments discussed in this section may be viewed in the corresponding supplementary video, available online: http://rll.berkeley.edu/icra2015gps. A video illustration of the visuomotor policies, discussed in the following sections, is also available: http://sites.google.com/site/visuomotorpolicy.' and 'We used the Caffe deep learning library (Jia et al., 2014) for CNN training.'
Open Datasets Yes Since the training set is still small (we use 1000 images collected from random arm motions), we initialize the filters in the first layer with weights from the model of Szegedy et al. (2014), which is trained on Image Net (Deng et al., 2009) classification.
Dataset Splits No The paper describes different experimental conditions such as 'training target positions and grasps', 'new target positions not seen during training and, for the hammer, new grasps (spatial test)', and 'training positions with visual distractors (visual test)'. It also mentions 'The policies were trained on four different hole positions, and then tested on four new hole positions to evaluate generalization.' However, it does not provide specific percentages or sample counts for training, validation, and test splits of a single dataset. It refers to 'number of trials per test' in Figure 9, which represents evaluation conditions rather than data partitioning methodology.
Hardware Specification Yes All of the robotic experiments were conducted on a PR2 robot. The robot was controlled at 20 Hz via direct effort control,5 and camera images were recorded using the RGB camera on a Prime Sense Carmine sensor.
Software Dependencies No The paper mentions 'We used the Caffe deep learning library (Jia et al., 2014) for CNN training.' and 'All of the simulated experiments used the Mu Jo Co simulation package (Todorov et al., 2012)'. However, no specific version numbers for Caffe or MuJoCo are provided.
Experiment Setup Yes Our CNNs have 92,000 parameters and 7 layers, including a novel spatial feature point transformation... Our visuomotor policy runs at 20 Hz on the robot... The visual processing layers of the network consist of three convolutional layers... The third convolutional layer contains 32 response maps with resolution 109x109. These response maps are passed through a spatial softmax function... The spatial feature points (fcx, fcy) are concatenated with the robot’s configuration and fed into two fully connected layers, each with 40 rectified units, followed by linear connections to the torques... We use a step size of α = 0.1 in all of our experiments... The weights νt are initialized to 0.01... The 2D peg insertion task has 6 state dimensions... Trials were 8 seconds in length and simulated at 100 Hz... The cost function is given by ā„“(xt, ut) = 1/2wu ||ut||^2 + wpā„“12(pxt āˆ’ p)... The weights were set to wu = 10^-6 and wp = 1.