Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor Control
Authors: Dongyoon Hwang, Byungkun Lee, Hojoon Lee, Hyunseung Kim, Jaegul Choo
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Co In with three distinct types of pretrained Vi Ts (CLIP, MVP, VC-1) across 12 varied control tasks within three separate domains (Adroit, Meta World, DMC), and demonstrate that Co In consistently enhances control task performance across all experimented environments and models, validating the effectiveness of providing pretrained Vi Ts with control-centric biases. |
| Researcher Affiliation | Academia | 1Kim Jaechul Graduate School of AI, KAIST. Correspondence to: Dongyoon Hwang <godnpeter@kaist.ac.kr>. |
| Pseudocode | No | The paper describes the architecture and processes in prose and diagrams (e.g., Section 3, Figure 3) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code: https://github.com/dojeon-ai/Co In |
| Open Datasets | Yes | We consider a total of 12 tasks across three different domains: 2 tasks from Adroit (Rajeswaran et al., 2018), 5 tasks from Meta World (Yu et al., 2020), and 5 tasks from DMC (Tassa et al., 2018).Following existing work (Hansen et al., 2022; Parisi et al., 2022; Majumdar et al., 2023; Nair et al., 2022), we utilize 100 expert demonstrations for Adroit and DMC, and 25 for Meta World, across a training span of 100 epochs. |
| Dataset Splits | No | The paper states, "Following existing work (...), we utilize 100 expert demonstrations for Adroit and DMC, and 25 for Meta World, across a training span of 100 epochs." and "The visuo-motor control policy s performance is evaluated every 5 epochs, with the best success rate achieved during training reported across three independent runs for each task." While it describes a training and evaluation process, it does not explicitly provide percentages or sample counts for training, validation, and test dataset splits. |
| Hardware Specification | Yes | inference speed was calculated on a single RTX-3090 GPU using a single input image with a resolution of 224 x 224. |
| Software Dependencies | No | The paper mentions optimizers like "Adam W" and "Adam" and uses "ptflops" for computation costs, but it does not specify software library versions (e.g., "PyTorch 1.9", "TensorFlow 2.x") or specific versions for tools like ptflops. |
| Experiment Setup | Yes | Detailed hyperparameters for finetuning pretrained visual encoders with and without Co In are listed in Table 8. and Detailed hyperparameters for finetuning the control policy network are listed in Table 9. |