Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
AlphaNet: Improved Training of Supernets with Alpha-Divergence
Authors: Dilin Wang, Chengyue Gong, Meng Li, Qiang Liu, Vikas Chandra
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply the proposed -divergence based supernets training to both slimmable neural networks and weight-sharing NAS, and demonstrate significant improvements. Specifically, our discovered model family, Alpha Net, outperforms prior-art models on a wide range of FLOPs regimes, including Big NAS, Once-for All networks, and Attentive NAS. We achieve Image Net top-1 accuracy of 80.0% with only 444M FLOPs. and 4. Experiments We apply our Adaptive-KD to improve notable supernet-based applications, including slimmable neural networks (Yu & Huang, 2019) and weight-sharing NAS (e.g., Cai et al., 2019a; Yu et al., 2020; Wang et al., 2020a). We provide an overview of our algorithm for training the supernet in Algorithm 1. |
| Researcher Affiliation | Collaboration | 1Facebook 2Department of Computer Science, The University of Texas at Austin. Correspondence to: Dilin Wang <EMAIL>, Chengyue Gong <EMAIL>, Meng Li <EMAIL>, Qiang Liu <EMAIL>, Vikas Chandra <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Training supernets with -divergence |
| Open Source Code | Yes | Our code and pretrained models are available at https://github.com/ facebookresearch/Alpha Net. |
| Open Datasets | Yes | We evaluate on the Image Net dataset (Deng et al., 2009). |
| Dataset Splits | Yes | To estimate the performance Pareto, we proceed as follows: 1) we first randomly sample 512 sub-networks from the supernet and estimate their accuracy on the Image Net validation set; |
| Hardware Specification | No | We train all models for 360 epochs using SGD optimizer... and batch size of 2048 on 16 GPUs. No specific GPU model or other hardware specifications were provided. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA) are explicitly mentioned in the paper. |
| Experiment Setup | Yes | Additionally, we train all models for 360 epochs using SGD optimizer with momentum as 0.9, weight decay as 10 5 and dropout as 0.2. We use cosine learning rate decay, with an initial learning rate of 0.8, and batch size of 2048 on 16 GPUs. |