reproducibilityindex.ai

Goal-Aware Prediction: Learning to Model What Matters

Authors: Suraj Nair, Silvio Savarese, Chelsea Finn

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space, resulting in a learning objective that more closely matches the downstream task. Further, we do so in an entirely self-supervised manner, without the need for a reward function or image labels. We ﬁnd that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
Researcher Affiliation	Academia	1Stanford University. Correspondence to: Suraj Nair <surajn@stanford.edu>.
Pseudocode	Yes	Algorithm 1 Latent MPC(fenc, fdyn, st, sg)
Open Source Code	Yes	Videos/code can be found at https://sites.google. com/stanford.edu/gap
Open Datasets	Yes	Our primary experimental domain is a simulated tabletop manipulation task built off of the Meta-World suite of environments (Yu et al., 2019a). Speciﬁcally, it consists of a simulated Sawyer robot, and 3 blocks on a tabletop. In the self-supervised data collection phase, the agent executes a random policy for 2,000 episodes, collecting 100,000 frames worth of data. ... We also study model error on real robot data from the BAIR Robot Dataset (Ebert et al., 2017) and Robo Net dataset (Dasari et al., 2019) in Section 5.4.
Dataset Splits	No	The paper mentions data collection of '100,000 frames' and training iterations, but does not provide explicit numerical splits (e.g., percentages or counts) for training, validation, or testing sets.
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments, such as GPU or CPU models.
Software Dependencies	No	The paper does not specify versions for any software libraries, frameworks, or programming languages used in the experiments.
Experiment Setup	Yes	Data Collection and Model Training: In our selfsupervised setting, data collection simply corresponds to rolling out a random exploration policy in the environment. Speciﬁcally, we sample uniformly from the agent s action space, and collect 2000 episodes, each of length 50, for a total of 100,000 frames of data. ... During training, sub-trajectories of length 30 time steps are sampled from the data set, with the last timestep labeled as the goal sg = s30. ... We use a curriculum when training all models, where H starts at 0, and is incremented by 1 every 50,000 training iterations. All models are trained to convergence, for about 300, 000 iterations on the same dataset.