Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Few-Shot Incremental Multi-modal Learning via Touch Guidance and Imaginary Vision Synthesis
Authors: Lina Wei, Yuhang Ma, Zhongsheng Lin, Fangfang Wang, Canghong Jin, Hanbin Zhao, Dapeng Chen
IJCAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on the Touch and Go and Vis Gel datasets demonstrate that the TIFS framework exhibits robust continuous learning capabilities and strong generalization performance in touch-vision few-shot incremental learning tasks. Our code is available at https://github.com/Vision-Multimodal-Lab-HZCU/TIFS. |
| Researcher Affiliation | Academia | 1School of Computer Science and Computing, Hangzhou City University 2Zhejiang Provincial Engineering Research Center for Real-Time Smart Tech in Urban Security Governance 3School of Information Science and Technology, Hangzhou Normal University 4College of Computer Science and Technology, Zhejiang University 5School of Automation, Nanjing University of Information Science and Technology |
| Pseudocode | No | The paper describes methods using mathematical formulations and textual explanations but does not contain explicitly labeled pseudocode blocks or algorithms in a structured, code-like format. |
| Open Source Code | Yes | Our code is available at https://github.com/Vision-Multimodal-Lab-HZCU/TIFS. |
| Open Datasets | Yes | Experimental results on the Touch and Go and Vis Gel datasets demonstrate that the TIFS framework exhibits robust continuous learning capabilities and strong generalization performance in touch-vision few-shot incremental learning tasks. ... The Touch and Go dataset comprises 20 distinct categories of touch and vision data, from which we selected 18 categories with significant instances, totaling 3,378 instances. ... [Yang et al., 2022] Fengyu Yang, Chenyang Ma, Jiacheng Zhang, Jing Zhu, Wenzhen Yuan, and Andrew Owens. Touch and go: Learning from human-collected vision and touch. ar Xiv preprint ar Xiv:2211.12498, 2022. |
| Dataset Splits | Yes | Overall, our experimental dataset encompasses 24 distinct categories of touch and vision data pairs, totaling 3,438 instances, and we randomly divide the instance objects of each category into training sets, validation sets, and test sets in a ratio of 7:1.5:1.5. |
| Hardware Specification | Yes | We trained our model on one RTX A6000 GPU and the hyperparameters τ, λSample, λClass, λAAM are set to 0.05, 1.0, 0.5, 0.5, respectively. |
| Software Dependencies | No | The paper mentions using Video MAE for feature extraction, but it does not specify any programming languages, libraries, or other software with version numbers used for implementation or experimentation. |
| Experiment Setup | Yes | We trained our model on one RTX A6000 GPU and the hyperparameters τ, λSample, λClass, λAAM are set to 0.05, 1.0, 0.5, 0.5, respectively. Simultaneously, the size of the memory buffer was set to 200. We divided 24 different categories into 8 incremental steps in ascending order, with each incremental step containing 3 categories. |