reproducibilityindex.ai

A Touch, Vision, and Language Dataset for Multimodal Alignment

Authors: Letian Fu, Gaurav Datta, Huang Huang, William Chung-Ho Panitch, Jaimyn Drake, Joseph Ortiz, Mustafa Mukadam, Mike Lambeta, Roberto Calandra, Ken Goldberg

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Results suggest that by incorporating touch, the TVL model improves (+29% classification accuracy) tactile-vision-language alignment over existing models trained on any pair of those modalities. Although only a small fraction of the dataset is human labeled, the TVL model demonstrates improved visual-tactile understanding over GPT-4V (+12%) and open-source vision-language models (+32%) on a new touch-vision understanding benchmark. Code, checkpoints and data are available on https: //tactile-vlm.github.io.
Researcher Affiliation	Collaboration	1UC Berkeley 2Meta AI 3TU Dresden.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks. Methods are described in prose and with diagrams, but not in a pseudocode format.
Open Source Code	Yes	Code, checkpoints and data are available on https: //tactile-vlm.github.io.
Open Datasets	Yes	In this work, we present the Touch-Vision-Language (TVL) dataset, a novel dataset consisting of 44K paired visiontactile observations, where 10% of the data are annotated by humans while the rest are labeled by GPT-4V. ... Code, checkpoints and data are available on https: //tactile-vlm.github.io.
Dataset Splits	No	The paper states, "We perform a 99%-1% train-test split across both dataset components..." but does not specify an explicit split percentage or sample count for a separate validation set needed for full reproducibility. It mentions a "validation set" in Table 3 footnote but without details on its derivation.
Hardware Specification	Yes	All experiments are run on a single NVIDIA A100 GPU.
Software Dependencies	No	The paper mentions optimizers (AdamW) and models (Open CLIP, LLa MA2 7B) but does not provide specific version numbers for software dependencies like programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or specific library versions.
Experiment Setup	Yes	Config Value optimizer Adam W (Loshchilov & Hutter, 2017b) base learning rate 1.5e-4 learning rate schedule cosine decay (Loshchilov & Hutter, 2017a) batch size 256 weight decay 0.05 optimizer momentum β1, β2 = 0.9, 0.95 (Chen et al., 2020) warm up epoch (Goyal et al., 2017) 10 total epochs 200