Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Few-Shot Incremental Multi-modal Learning via Touch Guidance and Imaginary Vision Synthesis

Authors: Lina Wei, Yuhang Ma, Zhongsheng Lin, Fangfang Wang, Canghong Jin, Hanbin Zhao, Dapeng Chen

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the Touch and Go and Vis Gel datasets demonstrate that the TIFS framework exhibits robust continuous learning capabilities and strong generalization performance in touch-vision few-shot incremental learning tasks. Our code is available at https://github.com/Vision-Multimodal-Lab-HZCU/TIFS.
Researcher Affiliation Academia 1School of Computer Science and Computing, Hangzhou City University 2Zhejiang Provincial Engineering Research Center for Real-Time Smart Tech in Urban Security Governance 3School of Information Science and Technology, Hangzhou Normal University 4College of Computer Science and Technology, Zhejiang University 5School of Automation, Nanjing University of Information Science and Technology
Pseudocode No The paper describes methods using mathematical formulations and textual explanations but does not contain explicitly labeled pseudocode blocks or algorithms in a structured, code-like format.
Open Source Code Yes Our code is available at https://github.com/Vision-Multimodal-Lab-HZCU/TIFS.
Open Datasets Yes Experimental results on the Touch and Go and Vis Gel datasets demonstrate that the TIFS framework exhibits robust continuous learning capabilities and strong generalization performance in touch-vision few-shot incremental learning tasks. ... The Touch and Go dataset comprises 20 distinct categories of touch and vision data, from which we selected 18 categories with significant instances, totaling 3,378 instances. ... [Yang et al., 2022] Fengyu Yang, Chenyang Ma, Jiacheng Zhang, Jing Zhu, Wenzhen Yuan, and Andrew Owens. Touch and go: Learning from human-collected vision and touch. ar Xiv preprint ar Xiv:2211.12498, 2022.
Dataset Splits Yes Overall, our experimental dataset encompasses 24 distinct categories of touch and vision data pairs, totaling 3,438 instances, and we randomly divide the instance objects of each category into training sets, validation sets, and test sets in a ratio of 7:1.5:1.5.
Hardware Specification Yes We trained our model on one RTX A6000 GPU and the hyperparameters τ, λSample, λClass, λAAM are set to 0.05, 1.0, 0.5, 0.5, respectively.
Software Dependencies No The paper mentions using Video MAE for feature extraction, but it does not specify any programming languages, libraries, or other software with version numbers used for implementation or experimentation.
Experiment Setup Yes We trained our model on one RTX A6000 GPU and the hyperparameters τ, λSample, λClass, λAAM are set to 0.05, 1.0, 0.5, 0.5, respectively. Simultaneously, the size of the memory buffer was set to 200. We divided 24 different categories into 8 incremental steps in ascending order, with each incremental step containing 3 categories.