Learning to Jointly Understand Visual and Tactile Signals
Authors: Yichen Li, Yilun Du, Chao Liu, Chao Liu, Francis Williams, Michael Foshey, Benjamin Eckart, Jan Kautz, Joshua B. Tenenbaum, Antonio Torralba, Wojciech Matusik
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We collect a multi-modal visual-tactile dataset that contains paired full-hand force pressure maps and manipulation videos. We also propose a novel method to learn a cross-modal latent manifold that allows for cross-modal prediction and discovery of latent structure in different data modalities. We conduct extensive experiments to demonstrate the effectiveness of our method. |
| Researcher Affiliation | Collaboration | Yichen Li1, Yilun Du1, Chao Liu1, Chao Liu2, Francis Williams2, Michael Foshey1, Ben Eckart2, Jan Kautz2, Joshua B. Tenenbaum1, Antonio Torralba1, Wojciech Matusik1 1MIT CSAIL 2NVIDIA |
| Pseudocode | No | The paper describes the method using mathematical formulations and textual explanations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper provides a link to a supplementary website: "For further references: https://sites.google.com/view/iclr-submission-force-vision/home?authuser=3", but it does not explicitly state that the source code for the described methodology is available at this link or elsewhere. |
| Open Datasets | No | The paper describes a dataset collected by the authors: "To this end, we collect a cross-modal force-vision dataset. We collect the active full-hand force pressure sequence data using specialized tactile sensor gloves (Sundaram et al., 2019b)." However, it does not provide any public access information (link, DOI, formal citation) for this dataset. The given link "https://sites.google.com/view/iclr-submission-force-vision/home?authuser=3" is for "further references" and does not explicitly offer dataset download. |
| Dataset Splits | No | The paper describes training and test sets but does not explicitly mention a separate validation set. For example: "Our training set contains 81 objects spanning 4 different categories, with paired tactile and video recordings of manipulation, containing 123,561 frames of data. Our test set is constructed with 3 different subset..." |
| Hardware Specification | Yes | We use a workstation equipped with NVIDIA Tesla V100 GPUs and 64-core AMD CPUs. |
| Software Dependencies | Yes | All baseline methods and our proposed method are implemented using the open-source Pytorch (Paszke et al., 2019) package with CUDA 11.7 backend. |
| Experiment Setup | Yes | All our manifolds are randomly initialized with Gaussian distribution with (µ, σ) = (0, 1). Tactile manifold Mh is initialized to be 16 dimensional and all other manifolds are initialized to be 256 dimensional. We use MLPs to express our neural fields Φ. All MLP neural fields are of three layers and 512 hidden dimension... We use Adam optimizer for training the baselines and our methods with learning rate initialized to be 1e 3; batch size is set to 64. |