Layer Compression of Deep Networks with Straight Flows
Authors: Chengyue Gong, Xiaocong Du, Bhargav Bhushanam, Lemeng Wu, Xingchao Liu, Dhruv Choudhary, Arun Kejariwal, Qiang Liu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we demonstrate that our method outperforms direct distillation and other baselines on different model architectures (e.g. Res Net, Vi T) on image classification and semantic segmentation tasks. |
| Researcher Affiliation | Collaboration | 1 University of Texas at Austin, 2 Meta, Inc. |
| Pseudocode | Yes | Algorithm 1: Compression with Straight Flows: Main Algorithm |
| Open Source Code | No | The paper does not provide an explicit statement about releasing their code or a link to a code repository for their proposed method. |
| Open Datasets | Yes | We evaluate our model performance on CIFAR-10 and Image Net, upon vision transformers and Res Net. |
| Dataset Splits | Yes | We evaluate our model performance on CIFAR-10 and Image Net, upon vision transformers and Res Net. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or cloud instance specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions software like 'Adam W' and 'mmsegmentation' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We use the Adam W (Loshchilov and Hutter 2018) optimizer with batch size 512 and an initial learning rate 5 10 4 with cosine learning rate decay (Loshchilov and Hutter 2019). For our method, the first two stage uses 300 epochs to train, respectively. For the final distillation refinement stage, we train the model with 400 epochs. |