MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures

Authors: Jeong Un Ryu, JaeWoong Shin, Hae Beom Lee, Sung Ju Hwang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate the efficacy and generality of Meta Perturb trained on a specific source domain and architecture, by applying it to the training of diverse neural architectures on heterogeneous target datasets against various regularizers and fine-tuning. The results show that the networks trained with Meta Perturb significantly outperform the baselines on most of the tasks and architectures, with a negligible increase in the parameter size and no hyperparameters to tune.
Researcher Affiliation Collaboration Jeongun Ryu1 Jaewoong Shin1 Hae Beom Lee1 Sung Ju Hwang 1,2 1KAIST, 2AITRICS, South Korea
Pseudocode Yes Algorithm 1 Meta-training
Open Source Code No The paper does not explicitly state that open-source code for the methodology is provided nor does it include a link to a code repository.
Open Datasets Yes We use Tiny Image Net [1] as the source dataset, which is a subset of the Image Net [33] dataset. ... We then transfer our perturbation function to the following target tasks: STL10 [7], CIFAR-100 [18], Stanford Dogs [16], Stanford Cars [17], Aircraft [25], and CUB [44].
Dataset Splits Yes We class-wisely split the dataset into 10 splits to produce heterogeneous task samples. ... Thus we select the best performing noise generator over five meta-training runs using a validation set consisting of samples from CIFAR-100, that is disjoint from s-CIFAR100, and use it throughout all the experiments in the paper.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only a general mention of 'a single GPU'.
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup Yes For the base regularizations, we used the weight decay of 0.0005 and random cropping and horizontal flipping in all experiments.