Model LEGO: Creating Models Like Disassembling and Assembling Building Blocks

Authors: Jiacong Hu, Jing Gao, Jingwen Ye, Yang Gao, Xingen Wang, Zunlei Feng, Mingli Song

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments showcase that task-aware components disassembled from CNN classifiers or new models assembled using these components closely match or even surpass the performance of the baseline, demonstrating its promising results for model reuse. Furthermore, MDA exhibits diverse potential applications, with comprehensive experiments exploring model decision route analysis, model compression, knowledge distillation, and more.
Researcher Affiliation Collaboration Jiacong Hu1,5, Jing Gao2, Jingwen Ye3, Yang Gao7, Xingen Wang1,7, Zunlei Feng4,5,6 *, Mingli Song1,5,6 1College of Computer Science and Technology, Zhejiang University, 2 Robotics Institute, School of Computer Science, Carnegie Mellon University, 3Electrical and Computer Engineering, National University of Singapore 4School of Software Technology, Zhejiang University, 5State Key Laboratory of Blockchain and Data Security, Zhejiang University, 6Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security, 7Bangsheng Technology Co., Ltd.
Pseudocode No The paper does not contain explicit pseudocode blocks or algorithms labeled as such.
Open Source Code Yes For more information, please visit https://model-lego.github.io/. Comprehensive details and the source code can be found in the Supplementary Material.
Open Datasets Yes We select three datasets and three mainstream CNN classifiers to evaluate our MDA method. The datasets include CIFAR-10 [19], CIFAR-100 [19], and Tiny-Image Net [20]. ... on the Cora dataset [26]. ... Image Net [24]. ... MNIST [55], Fashion MNIST [60].
Dataset Splits No The paper mentions 'model training' and evaluation but does not explicitly state specific train/validation/test split percentages or sample counts for the datasets used.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments (e.g., GPU models, CPU specifications, memory).
Software Dependencies No The paper mentions using the 'SGD optimizer' but does not specify software dependencies with version numbers (e.g., PyTorch 1.9, Python 3.8).
Experiment Setup Yes Parameter Configuration. In our MDA method, the key parameters are α and β, as defined in Eqn.9 and Eqn.10, respectively. By default, we set α = 0.3 and β = 0.2 in convolutional layers, and α = 0.4 and β = 0.3 in fully connected layers, unless specified otherwise. The model training is conducted using the SGD optimizer, with a learning rate of 0.01. To ensure the reliability and reproducibility of our results, we report the average of three independent experimental runs for each result.