On Bias-Variance Alignment in Deep Models

Authors: Lin Chen, Michal Lukasik, Wittawat Jitkrittum, Chong You, Sanjiv Kumar

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present empirical evidence confirming this phenomenon in a variety of deep learning models and datasets. Moreover, we study this phenomenon from two theoretical perspectives: calibration and neural collapse. We first show theoretically that under the assumption that the models are well calibrated, we can observe the bias-variance alignment. Second, starting from the picture provided by the neural collapse theory, we show an approximate correlation between bias and variance. Our main contributions are: (1) We conduct experiments to show that the bias-variance alignment holds for a variety of model architectures and on different datasets.
Researcher Affiliation Industry Lin Chen, Michal Lukasik, Wittawat Jitkrittum, Chong You, Sanjiv Kumar Google Research, {linche,mlukasik,wittawat,cyou,sanjivk}@google.com
Pseudocode No The paper does not contain any sections, figures, or blocks explicitly labeled as 'Pseudocode' or 'Algorithm', nor are there any structured, code-like procedural steps presented.
Open Source Code No The paper does not contain any explicit statements about making its source code available, nor does it provide a link to a code repository for the methodology described.
Open Datasets Yes Res Net-50 trained on Image Net (Figure 1 caption); Res Net-56 (on CIFAR-10), Res Net-8 (on CIFAR-10), Res Net-50 (on Image Net), and Res Net-110 (on CIFAR-100) (Table 3); fine-tune BERT models on TREC the dataset (Section 3.2, and Appendix E.7).
Dataset Splits No The paper refers to using 'train set' and 'test samples' (e.g., 'bootstrapping of training set', 'over all test samples') and mentions datasets like CIFAR-10 and ImageNet, but it does not explicitly specify the proportions or methodology for train/validation/test splits for model training.
Hardware Specification No The paper does not provide any specific details regarding the hardware used to run its experiments, such as GPU models, CPU types, or cloud computing instance specifications.
Software Dependencies No The paper mentions using 'Adam' for training BERT models but does not provide specific version numbers for any software dependencies or libraries required to replicate the experiments.
Experiment Setup Yes In this experiment, each of the two ensembles consists of 20 BERT models. In each case, each of these models was initialized from the same pre-trained checkpoint, and trained for 20 epochs with learning rate of 2e-5 using Adam. We use a polynomial decay learning rate schedule with the number of warm-up steps set to be 10% of the number of total update steps. Training batch size was set to 8.