reproducibilityindex.ai

Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor Control

Authors: Dongyoon Hwang, Byungkun Lee, Hojoon Lee, Hyunseung Kim, Jaegul Choo

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Co In with three distinct types of pretrained Vi Ts (CLIP, MVP, VC-1) across 12 varied control tasks within three separate domains (Adroit, Meta World, DMC), and demonstrate that Co In consistently enhances control task performance across all experimented environments and models, validating the effectiveness of providing pretrained Vi Ts with control-centric biases.
Researcher Affiliation	Academia	1Kim Jaechul Graduate School of AI, KAIST. Correspondence to: Dongyoon Hwang <godnpeter@kaist.ac.kr>.
Pseudocode	No	The paper describes the architecture and processes in prose and diagrams (e.g., Section 3, Figure 3) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code: https://github.com/dojeon-ai/Co In
Open Datasets	Yes	We consider a total of 12 tasks across three different domains: 2 tasks from Adroit (Rajeswaran et al., 2018), 5 tasks from Meta World (Yu et al., 2020), and 5 tasks from DMC (Tassa et al., 2018).Following existing work (Hansen et al., 2022; Parisi et al., 2022; Majumdar et al., 2023; Nair et al., 2022), we utilize 100 expert demonstrations for Adroit and DMC, and 25 for Meta World, across a training span of 100 epochs.
Dataset Splits	No	The paper states, "Following existing work (...), we utilize 100 expert demonstrations for Adroit and DMC, and 25 for Meta World, across a training span of 100 epochs." and "The visuo-motor control policy s performance is evaluated every 5 epochs, with the best success rate achieved during training reported across three independent runs for each task." While it describes a training and evaluation process, it does not explicitly provide percentages or sample counts for training, validation, and test dataset splits.
Hardware Specification	Yes	inference speed was calculated on a single RTX-3090 GPU using a single input image with a resolution of 224 x 224.
Software Dependencies	No	The paper mentions optimizers like "Adam W" and "Adam" and uses "ptflops" for computation costs, but it does not specify software library versions (e.g., "PyTorch 1.9", "TensorFlow 2.x") or specific versions for tools like ptflops.
Experiment Setup	Yes	Detailed hyperparameters for finetuning pretrained visual encoders with and without Co In are listed in Table 8. and Detailed hyperparameters for finetuning the control policy network are listed in Table 9.