Goal-Aware Prediction: Learning to Model What Matters
Authors: Suraj Nair, Silvio Savarese, Chelsea Finn
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space, resulting in a learning objective that more closely matches the downstream task. Further, we do so in an entirely self-supervised manner, without the need for a reward function or image labels. We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning. |
| Researcher Affiliation | Academia | 1Stanford University. Correspondence to: Suraj Nair <surajn@stanford.edu>. |
| Pseudocode | Yes | Algorithm 1 Latent MPC(fenc, fdyn, st, sg) |
| Open Source Code | Yes | Videos/code can be found at https://sites.google. com/stanford.edu/gap |
| Open Datasets | Yes | Our primary experimental domain is a simulated tabletop manipulation task built off of the Meta-World suite of environments (Yu et al., 2019a). Specifically, it consists of a simulated Sawyer robot, and 3 blocks on a tabletop. In the self-supervised data collection phase, the agent executes a random policy for 2,000 episodes, collecting 100,000 frames worth of data. ... We also study model error on real robot data from the BAIR Robot Dataset (Ebert et al., 2017) and Robo Net dataset (Dasari et al., 2019) in Section 5.4. |
| Dataset Splits | No | The paper mentions data collection of '100,000 frames' and training iterations, but does not provide explicit numerical splits (e.g., percentages or counts) for training, validation, or testing sets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper does not specify versions for any software libraries, frameworks, or programming languages used in the experiments. |
| Experiment Setup | Yes | Data Collection and Model Training: In our selfsupervised setting, data collection simply corresponds to rolling out a random exploration policy in the environment. Specifically, we sample uniformly from the agent s action space, and collect 2000 episodes, each of length 50, for a total of 100,000 frames of data. ... During training, sub-trajectories of length 30 time steps are sampled from the data set, with the last timestep labeled as the goal sg = s30. ... We use a curriculum when training all models, where H starts at 0, and is incremented by 1 every 50,000 training iterations. All models are trained to convergence, for about 300, 000 iterations on the same dataset. |