Multi-Attention Based Visual-Semantic Interaction for Few-Shot Learning
Authors: Peng Zhao, Yin Wang, Wei Wang, Jie Mu, Huiting Liu, Cong Wang, Xiaochun Cao
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on four benchmark datasets demonstrate that our proposed MAVSI could outperform existing state-of-the-art FSL methods. |
| Researcher Affiliation | Academia | 1School of Computer Science and Technology, Anhui University 2School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University 3School of Data Science and Artificial Intelligence, Dongbei University of Finance and Economics 4Department of Computing, The Hong Kong Polytechnic University |
| Pseudocode | No | The paper does not contain any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing its source code or a link to a code repository. |
| Open Datasets | Yes | mini Image Net [Vinyals et al., 2016] consists of 100 classes. These classes are divided into 64, 16, and 20 for training, validation, and testing. tired Image Net [Ren et al., 2018] contains 608 classes, split into 351, 97, and 160 for training, validation, and testing. CIFAR-FS [Bertinetto et al., 2019] consists of 100 classes. These classes are divided into 64, 16, and 20 for training, validation, and testing. CUB-200-2011 [Wah et al., 2011] contains images from 200 bird species, where 200 species are divided into 100, 50, and 50 for training, validation, and testing, respectively. |
| Dataset Splits | Yes | mini Image Net [Vinyals et al., 2016] consists of 100 classes. These classes are divided into 64, 16, and 20 for training, validation, and testing. tired Image Net [Ren et al., 2018] contains 608 classes, split into 351, 97, and 160 for training, validation, and testing. CIFAR-FS [Bertinetto et al., 2019] consists of 100 classes. These classes are divided into 64, 16, and 20 for training, validation, and testing. CUB-200-2011 [Wah et al., 2011] contains images from 200 bird species, where 200 species are divided into 100, 50, and 50 for training, validation, and testing, respectively. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models) used for running its experiments. |
| Software Dependencies | No | The paper mentions using Glove as a semantic extractor but does not provide specific version numbers for software dependencies like programming languages or libraries. |
| Experiment Setup | Yes | Similar to previous works [Xing et al., 2019; Schwartz et al., 2022; Yang et al., 2022], we utilize Res Net-12 as the backbone network, and modify the number of convolutional filters from [64, 128, 256, 512] to [64, 160, 320, 640]. In all cases, the comparison network F is the MLP with a Leaky Re LUactivated hidden layer, and the relation network consists of convolutional layers and the MLP. We use Glove [Pennington et al., 2014] as the semantic extractor, which is pre-trained on a large corpus. Our experiments are implemented under 5way 1-shot and 5-way 5-shot settings. The input image size is 84 84. Following [Peng et al., 2019], we train the model for 150 epochs, with 800 episodes in each epoch. We use the Adam optimizer with a learning rate of 5e-3 and weight decay of 5e-6. The learning rate is dropped by half every 6,000 episodes, and other parameters such as λ, γ, and the temperature parameter τ are adjusted during end-to-end training. |