Robust Fine-Tuning of Deep Neural Networks with Hessian-based Generalization Guarantees

Authors: Haotian Ju, Dongyue Li, Hongyang R Zhang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform a detailed empirical study of our algorithm on various noisy environments and architectures. For example, on six image classification tasks whose training labels are generated with programmatic labeling, we show a 3.26% accuracy improvement over prior methods.
Researcher Affiliation Academia 1Northeastern University, Boston MA, United States.
Pseudocode Yes Algorithm 1 Consistent loss reweighting with layerwise projection
Open Source Code No The paper mentions 'For the baselines, we report the results from running their open-sourced implementations.' but does not state that the authors' own code for the described methodology is publicly available or provide a link.
Open Datasets Yes For image classification, we use six domains of object classification tasks from the Domain Net [PBX+19] dataset. [...] For text classification, we use the MRPC dataset from the GLUE benchmark [WSM+18]
Dataset Splits Yes Hyper-parameters in the fine-tuning algorithms are selected based on the accuracy of the validation dataset. [...] Table 6: Basic statistics for six datasets with noisy labels [MCS+21]. [...] Number of validation Samples
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions software like 'Adam optimizer', 'Optuna [ASY+19] package', and 'Py Hessian [YGK+20]' but does not specify version numbers for these software components.
Experiment Setup Yes We use Adam optimizer with learning rate 1e-4 and decay the learning rate by 10 every 10 epochs. In the experiments on text classification datasets, we fine-tune the BERT-Base model for 5 epochs. We use Adam optimizer with an initial learning rate of 5e-4 and then linearly decay the learning rate. [...] We search the distance constraint parameter D in [0.05, 10] and the scaling parameter γ in [1, 5].