Deep Model Reassembly

Authors: Xingyi Yang, Daquan Zhou, Songhua Liu, Jingwen Ye, Xinchao Wang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that on Image Net, the best reassemble model achieves 78.6% top-1 accuracy without fine-tuning, which could be further elevated to 83.2% with end-to-end training. Our code is available at https://github.com/Adamdad/De Ry. In this section, we first explore some basic properties of the the proposed De Ry task, and then evaluate our solution on a series of transfer learning benchmarks to verify its efficiency.
Researcher Affiliation Collaboration Xingyi Yang1 Daquan Zhou1,2 Songhua Liu1 Jingwen Ye1 Xinchao Wang1 1National University of Singapore 2Bytedance
Pseudocode No The paper describes methods using mathematical formulations and descriptive text, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/Adamdad/De Ry.
Open Datasets Yes We construct our model zoo by collecting pre-trained weights from Torchvision 1, timm 2 and Open MMlab 3. We includes a series of manually designed CNN models like Res Net [30] and Res Ne Xt[84], as well as NAS-based architectures like Reg Net Y [63] and Mobile Netv3 [34]... Those models are pre-trained on Image Net1k [68], Image Net21K [67], Xrays [15] and i Naturalist2021 [77]... We evaluate transfer learning performance on 9 natural image datasets. These datasets covered a wide range of image classification tasks, including 3 object classification tasks CIFAR-10 [43], CIFAR-100 [43] and Caltech-101 [22]; 5 fine-grained classification tasks Flower-102 [59], Stanford Cars [42], FGVC Aircraft[53], Oxford-IIIT Pets [61] and CUB-Bird [79] and 1 texture classification task DTD [14].
Dataset Splits No The paper mentions using a
Hardware Specification Yes All experiments are conducted on a 8 Ge Force RTX 3090 server.
Software Dependencies No The paper mentions software components like
Experiment Setup Yes For all experiments, we set the partition number K = 4 and the block size coefficient ϵ = 0.2. We sample 1/20 samples from each train set to calculate the linear CKA representation similarity. The NASWOT [55] score is estimated with 5-batch average, where each mini-batch contains 32 samples. We set 5 levels of computational constraints, with Cparam {10, 20, 30, 50, 90} and CFLOPs {3, 5, 6, 10, 20}... We train each model for either 100 epochs as SHORT-TRAINING or a 300 epochs as FULL-TRAINING... We optimize each network with Adam W [50] alongside a initial learning rate of 1e 3 and cosine lr-decay, mini-batch of 1024 and weight decay of 0.05. We apply Rand Aug [17], Mixup [95] and Cut Mix [93] as data augmentation. All model are trained and tested on 224 image resolutions.