Re-parameterizing Your Optimizers rather than Architectures
Authors: Xiaohan Ding, Honghao Chen, Xiangyu Zhang, Kaiqi Huang, Jungong Han, Guiguang Ding
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS |
| Researcher Affiliation | Collaboration | 1Tencent AI Lab 2CRISE, Institute of Automation, Chinese Academy of Sciences 3School of Artificial Intelligence, University of Chinese Academy of Sciences 4MEGVII Technology 5Beijing Academy of Artificial Intelligence 6Department of Computer Science, the University of Sheffield 7School of Software, BNRist, Tsinghua University |
| Pseudocode | No | The paper describes methods and processes but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and models https://github.com/Ding Xiao H/Rep Optimizers. |
| Open Datasets | Yes | We use CIFAR-100 for searching the hyper-parameters of Rep Optimizers... For training Rep Opt-VGG and Rep VGG on Image Net |
| Dataset Splits | Yes | We report the accuracy on the validation set. |
| Hardware Specification | Yes | Max BS+1 would cause OOM (Out Of Memory) error on the 2080Ti GPU which has 11GB of memory. For the fair comparison, the training costs of all the models are tested with the same training script on the same machine with eight 2080Ti GPUs. |
| Software Dependencies | No | The paper mentions using PyTorch for quantization examples but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | Specifically, we use 8 GPUs, a batch size of 32 per GPU, input resolution of 224 224, and a learning rate schedule with 5-epoch warm-up, initial value of 0.1 and cosine annealing for 120 epochs. For the data augmentation, we use a pipeline of random cropping, left-right flipping and Rand Augment (Cubuk et al., 2020). We also use a label smoothing coefficient of 0.1. The regular SGD optimizers for the baseline models and the Rep Optimizers for Rep Opt-VGG use momentum of 0.9 and weight decay of 4 10 5. |