Deep Model Reassembly
Authors: Xingyi Yang, Daquan Zhou, Songhua Liu, Jingwen Ye, Xinchao Wang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that on Image Net, the best reassemble model achieves 78.6% top-1 accuracy without fine-tuning, which could be further elevated to 83.2% with end-to-end training. Our code is available at https://github.com/Adamdad/De Ry. In this section, we first explore some basic properties of the the proposed De Ry task, and then evaluate our solution on a series of transfer learning benchmarks to verify its efficiency. |
| Researcher Affiliation | Collaboration | Xingyi Yang1 Daquan Zhou1,2 Songhua Liu1 Jingwen Ye1 Xinchao Wang1 1National University of Singapore 2Bytedance |
| Pseudocode | No | The paper describes methods using mathematical formulations and descriptive text, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/Adamdad/De Ry. |
| Open Datasets | Yes | We construct our model zoo by collecting pre-trained weights from Torchvision 1, timm 2 and Open MMlab 3. We includes a series of manually designed CNN models like Res Net [30] and Res Ne Xt[84], as well as NAS-based architectures like Reg Net Y [63] and Mobile Netv3 [34]... Those models are pre-trained on Image Net1k [68], Image Net21K [67], Xrays [15] and i Naturalist2021 [77]... We evaluate transfer learning performance on 9 natural image datasets. These datasets covered a wide range of image classification tasks, including 3 object classification tasks CIFAR-10 [43], CIFAR-100 [43] and Caltech-101 [22]; 5 fine-grained classification tasks Flower-102 [59], Stanford Cars [42], FGVC Aircraft[53], Oxford-IIIT Pets [61] and CUB-Bird [79] and 1 texture classification task DTD [14]. |
| Dataset Splits | No | The paper mentions using a |
| Hardware Specification | Yes | All experiments are conducted on a 8 Ge Force RTX 3090 server. |
| Software Dependencies | No | The paper mentions software components like |
| Experiment Setup | Yes | For all experiments, we set the partition number K = 4 and the block size coefficient ϵ = 0.2. We sample 1/20 samples from each train set to calculate the linear CKA representation similarity. The NASWOT [55] score is estimated with 5-batch average, where each mini-batch contains 32 samples. We set 5 levels of computational constraints, with Cparam {10, 20, 30, 50, 90} and CFLOPs {3, 5, 6, 10, 20}... We train each model for either 100 epochs as SHORT-TRAINING or a 300 epochs as FULL-TRAINING... We optimize each network with Adam W [50] alongside a initial learning rate of 1e 3 and cosine lr-decay, mini-batch of 1024 and weight decay of 0.05. We apply Rand Aug [17], Mixup [95] and Cut Mix [93] as data augmentation. All model are trained and tested on 224 image resolutions. |