The Unsurprising Effectiveness of Pre-Trained Vision Models for Control

Authors: Simone Parisi, Aravind Rajeswaran, Senthil Purushwalkam, Abhinav Gupta

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive empirical evaluation in diverse control domains (Habitat, Deep Mind Control, Adroit, Franka Kitchen), we isolate and study the importance of different representation training methods, data augmentations, and feature hierarchies.
Researcher Affiliation Collaboration 1Meta AI 2Carnegie Mellon University.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Source code and more at https://sites.google. com/view/pvr-control.
Open Datasets Yes Image Net (Deng et al., 2009)
Dataset Splits No The paper describes training on collected trajectories and evaluating on online trajectories, but it does not specify explicit validation dataset splits (e.g., 80/10/10 percentages or specific counts for a validation set) distinct from the training and test/evaluation data.
Hardware Specification Yes Each node used four NVIDIA Ge Force GTX 1080 Ti GPUs. Policy imitation learning was performed on a SLURM-based cluster, using a NVIDIA Quadro GP100 GPU.
Software Dependencies No The paper mentions optimizers like Adam and RMSProp and refers to GitHub repositories for vision models, but it does not provide specific version numbers for software dependencies such as PyTorch, TensorFlow, or other libraries.
Experiment Setup Yes Policy Optimization. Following Parisi et al. (2021), we update the policy with 16 mini-batches of 100 consecutive steps with the RMSProp optimizer (Tieleman & Hinton, 2017) (learning rate 0.0001). Gradients are clipped to have max norm 40. Learning lasts for 125,000 policy updates.