OTOv2: Automatic, Generic, User-Friendly
Authors: Tianyi Chen, Luming Liang, Tianyu DING, Zhihui Zhu, Ilya Zharkov
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerically, we demonstrate the generality and autonomy of OTOv2 on a variety of model architectures such as VGG, Res Net, CARN, Conv Ne Xt, Dense Net and Stacked Unets, the majority of which cannot be handled by other methods without extensive handcrafting efforts. Together with benchmark datasets including CIFAR10/100, DIV2K, Fashion-MNIST, SVNH and Image Net, its effectiveness is validated by performing competitively or even better than the state-of-the-arts. |
| Researcher Affiliation | Collaboration | Tianyi Chen , Luming Liang, Tianyu Ding, Ilya Zharkov Microsoft Redmond, WA 98052, USA {tiachen,lulian,tianyuding,zharkov}@microsoft.com The Ohio State of University Columbus, OH 43210, USA zhu.3440@osu.edu |
| Pseudocode | Yes | Algorithm 1 Outline of OTOv2. Algorithm 2 Automated Zero-Invariant Group Partition. Algorithm 3 Dual Half-Space Projected Gradient (DHSPG) |
| Open Source Code | Yes | The source code is available at https://github.com/tianyic/only_train_once. |
| Open Datasets | Yes | Together with benchmark datasets including CIFAR10/100, DIV2K, Fashion-MNIST, SVNH and Image Net, its effectiveness is validated by performing competitively or even better than the state-of-the-arts. |
| Dataset Splits | No | The paper mentions using benchmark datasets for training and evaluation but does not explicitly provide the specific training/validation/test split percentages or sample counts for these datasets in the main text. |
| Hardware Specification | Yes | We conducted the experiments on one NVIDIA A100 GPU Server. |
| Software Dependencies | No | The paper mentions dependencies on Pytorch and ONNX but does not specify their exact version numbers, e.g., 'Pytorch (Paszke et al., 2019)' implies a version around 2019 but no specific numbered version. |
| Experiment Setup | Yes | For the experiments in the main body, we estimated the gradient via sampling a mini-batch of data points under first-order momentum with coefficient as 0.9. The mini-batch sizes follow other related works from {64, 128, 256}. All experiments in the main body share the same commonly-used learning rate scheduler that start from 10 1 and periodically decay by 10 till 10 4 every Tperiod epochs. The length of decaying period Tperiod depends on the maximum epoch, i.e., 120 for Image Net and 300 for others. |