reproducibilityindex.ai

OTOv2: Automatic, Generic, User-Friendly

Authors: Tianyi Chen, Luming Liang, Tianyu DING, Zhihui Zhu, Ilya Zharkov

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerically, we demonstrate the generality and autonomy of OTOv2 on a variety of model architectures such as VGG, Res Net, CARN, Conv Ne Xt, Dense Net and Stacked Unets, the majority of which cannot be handled by other methods without extensive handcrafting efforts. Together with benchmark datasets including CIFAR10/100, DIV2K, Fashion-MNIST, SVNH and Image Net, its effectiveness is validated by performing competitively or even better than the state-of-the-arts.
Researcher Affiliation	Collaboration	Tianyi Chen , Luming Liang, Tianyu Ding, Ilya Zharkov Microsoft Redmond, WA 98052, USA {tiachen,lulian,tianyuding,zharkov}@microsoft.com The Ohio State of University Columbus, OH 43210, USA zhu.3440@osu.edu
Pseudocode	Yes	Algorithm 1 Outline of OTOv2. Algorithm 2 Automated Zero-Invariant Group Partition. Algorithm 3 Dual Half-Space Projected Gradient (DHSPG)
Open Source Code	Yes	The source code is available at https://github.com/tianyic/only_train_once.
Open Datasets	Yes	Together with benchmark datasets including CIFAR10/100, DIV2K, Fashion-MNIST, SVNH and Image Net, its effectiveness is validated by performing competitively or even better than the state-of-the-arts.
Dataset Splits	No	The paper mentions using benchmark datasets for training and evaluation but does not explicitly provide the specific training/validation/test split percentages or sample counts for these datasets in the main text.
Hardware Specification	Yes	We conducted the experiments on one NVIDIA A100 GPU Server.
Software Dependencies	No	The paper mentions dependencies on Pytorch and ONNX but does not specify their exact version numbers, e.g., 'Pytorch (Paszke et al., 2019)' implies a version around 2019 but no specific numbered version.
Experiment Setup	Yes	For the experiments in the main body, we estimated the gradient via sampling a mini-batch of data points under first-order momentum with coefficient as 0.9. The mini-batch sizes follow other related works from {64, 128, 256}. All experiments in the main body share the same commonly-used learning rate scheduler that start from 10 1 and periodically decay by 10 till 10 4 every Tperiod epochs. The length of decaying period Tperiod depends on the maximum epoch, i.e., 120 for Image Net and 300 for others.