Model LEGO: Creating Models Like Disassembling and Assembling Building Blocks
Authors: Jiacong Hu, Jing Gao, Jingwen Ye, Yang Gao, Xingen Wang, Zunlei Feng, Mingli Song
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments showcase that task-aware components disassembled from CNN classifiers or new models assembled using these components closely match or even surpass the performance of the baseline, demonstrating its promising results for model reuse. Furthermore, MDA exhibits diverse potential applications, with comprehensive experiments exploring model decision route analysis, model compression, knowledge distillation, and more. |
| Researcher Affiliation | Collaboration | Jiacong Hu1,5, Jing Gao2, Jingwen Ye3, Yang Gao7, Xingen Wang1,7, Zunlei Feng4,5,6 *, Mingli Song1,5,6 1College of Computer Science and Technology, Zhejiang University, 2 Robotics Institute, School of Computer Science, Carnegie Mellon University, 3Electrical and Computer Engineering, National University of Singapore 4School of Software Technology, Zhejiang University, 5State Key Laboratory of Blockchain and Data Security, Zhejiang University, 6Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security, 7Bangsheng Technology Co., Ltd. |
| Pseudocode | No | The paper does not contain explicit pseudocode blocks or algorithms labeled as such. |
| Open Source Code | Yes | For more information, please visit https://model-lego.github.io/. Comprehensive details and the source code can be found in the Supplementary Material. |
| Open Datasets | Yes | We select three datasets and three mainstream CNN classifiers to evaluate our MDA method. The datasets include CIFAR-10 [19], CIFAR-100 [19], and Tiny-Image Net [20]. ... on the Cora dataset [26]. ... Image Net [24]. ... MNIST [55], Fashion MNIST [60]. |
| Dataset Splits | No | The paper mentions 'model training' and evaluation but does not explicitly state specific train/validation/test split percentages or sample counts for the datasets used. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments (e.g., GPU models, CPU specifications, memory). |
| Software Dependencies | No | The paper mentions using the 'SGD optimizer' but does not specify software dependencies with version numbers (e.g., PyTorch 1.9, Python 3.8). |
| Experiment Setup | Yes | Parameter Configuration. In our MDA method, the key parameters are α and β, as defined in Eqn.9 and Eqn.10, respectively. By default, we set α = 0.3 and β = 0.2 in convolutional layers, and α = 0.4 and β = 0.3 in fully connected layers, unless specified otherwise. The model training is conducted using the SGD optimizer, with a learning rate of 0.01. To ensure the reliability and reproducibility of our results, we report the average of three independent experimental runs for each result. |