reproducibilityindex.ai

iFlood: A Stable and Effective Regularizer

Authors: Yuexiang Xie, Zhen WANG, Yaliang Li, Ce Zhang, Jingren Zhou, Bolin Ding

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on both image classification and language understanding tasks to compare the performance improvements gained by different regularizers, demonstrating the effectiveness of the proposed i Flood. Further, we evaluate the stability1 of i Flood from several measurements, such as total variation distance and gradient norm. All the experimental results show that, with the local convergence suggested by i Flood, the learned models stably converge to solutions with better generalization ability.
Researcher Affiliation	Collaboration	1Alibaba Group, 2ETH Zürich {yuexiang.xyx, jones.wz, yaliang.li, jingren.zhou, bolin.ding}@alibaba-inc.com, ce.zhang@inf.ethz.ch
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statement or link to open-source code for the described methodology.
Open Datasets	Yes	We consider both image classification and language understanding tasks. For image classification, we use CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011), and Image Net (Russakovsky et al., 2015). For language understanding, we adopt the General Language Understanding Evaluation (GLUE) benchmark (Wang et al., 2019), and report the experimental results on SST-2, QQP, and QNLI. The details of these datasets and more experimental results on GLUE benchmark can be found in Appendix A and D.2 respectively. (Footnotes in Appendix A provide URLs for datasets: CIFAR-10, CIFAR-100, SVHN, ImageNet, SST-2, QQP, QNLI)
Dataset Splits	Yes	We randomly split the training data into training and validation sets with the proportion of 9:1, and apply grid search on the validation dataset for hyperparameter optimization (HPO).
Hardware Specification	Yes	All models are implemented using Py Torch (Paszke et al., 2019) and trained on NVIDIA Ge Force GTX 1080 Ti or Tesla V100 GPUs.
Software Dependencies	No	The paper mentions 'Py Torch (Paszke et al., 2019)' and 'the pre-trained BERT model provided by huggingface (Wolf et al., 2020)' but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	On the image classification datasets...We train Res Net18 for 300 epochs with 128 as the batch size. The learning rate is initialized as 0.1 and decays (multiplied by 0.2) at the 80-th, 160-th and 200-th epochs. As for Res Ne Xt50, we train it for 90 epochs with 256 as the batch size. The learning rate is initialized as 0.1 and decays (multiplied by 0.1) at the 30-th, and 60-th. On the language understanding datasets...The number of epochs is tuned among {3,4,5}, the batch size is 16, the learning rate is tuned among {2e 5, 5e 5}, and the dropout rate is 0.1. For Flooding and i Flood, the flood level b is tuned in the range of [0.10,0.50] via grid seach with 0.05 as the step size for Image Net, and tuned in the range of [0.01,0.10] via grid search with 0.01 as the step size for other datasets.