DC-NAS: Divide-and-Conquer Neural Architecture Search for Multi-Modal Classification
Authors: Xinyan Liang, Pinhan Fu, Qian Guo, Keyin Zheng, Yuhua Qian
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that DC-NAS achieves the state-of-the-art results in term of classification performance, training efficiency and the number of model parameters than the compared NAS-MMC methods on three popular multi-modal tasks including multi-label movie genre classification, action recognition with RGB and body joints and dynamic hand gesture recognition. |
| Researcher Affiliation | Academia | 1 Institute of Big Data Science and Industry, Shanxi University, Taiyuan 030006, China 2 School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China |
| Pseudocode | Yes | Algorithm 1: Divide-and-conquer neural architecture search (DC-NAS) |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | In this study, we evaluated DC-NAS on three popular multimodal tasks: (1) multi-label movie genre classification task on the MM-IMDB dataset (Arevalo et al. 2017), (2) multi-modal action recognition task on the NTU RGB-D dataset (Shahroudy et al. 2016), and (3) multi-modal gesture recognition task on the Ego Gesture dataset (Zhang et al. 2018). |
| Dataset Splits | Yes | MM-IMDB Dataset: The dataset is originally split into three subsets: 15,552 movies for training, 2,608 for validation, and 7,799 for testing purposes. NTU RGB-D Dataset: The training, validation, and testing sets comprise 23,760, 2,519, and 16,558 samples, respectively. Ego Gesture Dataset: our training set consisted of 14,416 samples, the validation set had 4,768 samples, and the testing set comprised 4,977 samples. |
| Hardware Specification | No | The paper mentions "GPU hours" in Table 4 for search cost, but does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running experiments. |
| Software Dependencies | No | The paper mentions using specific backbone models (e.g., Maxout MLP, VGG Transfer, Inflated Res Net-50) but does not list any specific software dependencies or their version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | MM-IMDB Dataset: For the parameters of our architecture, we set the population size N = 20, the number of population iterations T = 10, the dimension of the fusion vector FD = 256 and the modal features are repeatable. NTU RGB-D Dataset: We use a population size of 28, conduct 15 iterations, do not reuse modalities, and set the fusion modality dimension to be 64. Ego Gesture Dataset: The experimental settings for DC-NAS involves a population size of 28, 15 iterations, and non-reuse of modalities with a fusion dimension of 32. |