DC-NAS: Divide-and-Conquer Neural Architecture Search for Multi-Modal Classification

Authors: Xinyan Liang, Pinhan Fu, Qian Guo, Keyin Zheng, Yuhua Qian

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that DC-NAS achieves the state-of-the-art results in term of classification performance, training efficiency and the number of model parameters than the compared NAS-MMC methods on three popular multi-modal tasks including multi-label movie genre classification, action recognition with RGB and body joints and dynamic hand gesture recognition.
Researcher Affiliation Academia 1 Institute of Big Data Science and Industry, Shanxi University, Taiyuan 030006, China 2 School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China
Pseudocode Yes Algorithm 1: Divide-and-conquer neural architecture search (DC-NAS)
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes In this study, we evaluated DC-NAS on three popular multimodal tasks: (1) multi-label movie genre classification task on the MM-IMDB dataset (Arevalo et al. 2017), (2) multi-modal action recognition task on the NTU RGB-D dataset (Shahroudy et al. 2016), and (3) multi-modal gesture recognition task on the Ego Gesture dataset (Zhang et al. 2018).
Dataset Splits Yes MM-IMDB Dataset: The dataset is originally split into three subsets: 15,552 movies for training, 2,608 for validation, and 7,799 for testing purposes. NTU RGB-D Dataset: The training, validation, and testing sets comprise 23,760, 2,519, and 16,558 samples, respectively. Ego Gesture Dataset: our training set consisted of 14,416 samples, the validation set had 4,768 samples, and the testing set comprised 4,977 samples.
Hardware Specification No The paper mentions "GPU hours" in Table 4 for search cost, but does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running experiments.
Software Dependencies No The paper mentions using specific backbone models (e.g., Maxout MLP, VGG Transfer, Inflated Res Net-50) but does not list any specific software dependencies or their version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes MM-IMDB Dataset: For the parameters of our architecture, we set the population size N = 20, the number of population iterations T = 10, the dimension of the fusion vector FD = 256 and the modal features are repeatable. NTU RGB-D Dataset: We use a population size of 28, conduct 15 iterations, do not reuse modalities, and set the fusion modality dimension to be 64. Ego Gesture Dataset: The experimental settings for DC-NAS involves a population size of 28, 15 iterations, and non-reuse of modalities with a fusion dimension of 32.