Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization

Authors: Chelsea Finn, Sergey Levine, Pieter Abbeel

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on a series of simulated tasks and real-world robotic manipulation problems, demonstrating substantial improvement over prior methods both in terms of task complexity and sample efficiency.
Researcher Affiliation Academia University of California, Berkeley, Berkeley, CA 94709 USA
Pseudocode Yes Algorithm 1 Guided cost learning
Open Source Code No The paper provides a link to a video ('http://rll.berkeley.edu/gcl') but no explicit statement or link for open-source code for the methodology.
Open Datasets No The paper mentions using 'expert demonstrations' or 'human demonstrations' as data, and describes how they were generated or provided ('Between 20 and 32 demonstrations were generated...', 'between 25 and 30 human demonstrations were provided via kinesthetic teaching'), but it does not provide concrete access information (link, DOI, repository, or citation to an established public dataset) for these demonstrations.
Dataset Splits No The paper mentions 'demonstrations' and 'test states' or 'test condition' but does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and testing.
Hardware Specification No The paper mentions using a 'PR2 robot' and 'MuJoCo physics simulator' but does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions the 'MuJoCo physics simulator' and implies the use of 'neural network libraries based on backpropagation' but does not provide specific software names with version numbers.
Experiment Setup Yes We used a neural network cost function with two hidden layers with 24 52 units and rectifying nonlinearities of the form max(z, 0) followed by linear connections to a set of features yt, which had a size of 20 for the 2D navigation task and 100 for the other two tasks. The cost is then given by cθ(xt, ut) = Ayt + b 2 + wu ut 2 (2) with a fixed torque weight wu and the parameters consisting of A, b, and the network weights.