The Unsurprising Effectiveness of Pre-Trained Vision Models for Control
Authors: Simone Parisi, Aravind Rajeswaran, Senthil Purushwalkam, Abhinav Gupta
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive empirical evaluation in diverse control domains (Habitat, Deep Mind Control, Adroit, Franka Kitchen), we isolate and study the importance of different representation training methods, data augmentations, and feature hierarchies. |
| Researcher Affiliation | Collaboration | 1Meta AI 2Carnegie Mellon University. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Source code and more at https://sites.google. com/view/pvr-control. |
| Open Datasets | Yes | Image Net (Deng et al., 2009) |
| Dataset Splits | No | The paper describes training on collected trajectories and evaluating on online trajectories, but it does not specify explicit validation dataset splits (e.g., 80/10/10 percentages or specific counts for a validation set) distinct from the training and test/evaluation data. |
| Hardware Specification | Yes | Each node used four NVIDIA Ge Force GTX 1080 Ti GPUs. Policy imitation learning was performed on a SLURM-based cluster, using a NVIDIA Quadro GP100 GPU. |
| Software Dependencies | No | The paper mentions optimizers like Adam and RMSProp and refers to GitHub repositories for vision models, but it does not provide specific version numbers for software dependencies such as PyTorch, TensorFlow, or other libraries. |
| Experiment Setup | Yes | Policy Optimization. Following Parisi et al. (2021), we update the policy with 16 mini-batches of 100 consecutive steps with the RMSProp optimizer (Tieleman & Hinton, 2017) (learning rate 0.0001). Gradients are clipped to have max norm 40. Learning lasts for 125,000 policy updates. |