On Bias-Variance Alignment in Deep Models
Authors: Lin Chen, Michal Lukasik, Wittawat Jitkrittum, Chong You, Sanjiv Kumar
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present empirical evidence confirming this phenomenon in a variety of deep learning models and datasets. Moreover, we study this phenomenon from two theoretical perspectives: calibration and neural collapse. We first show theoretically that under the assumption that the models are well calibrated, we can observe the bias-variance alignment. Second, starting from the picture provided by the neural collapse theory, we show an approximate correlation between bias and variance. Our main contributions are: (1) We conduct experiments to show that the bias-variance alignment holds for a variety of model architectures and on different datasets. |
| Researcher Affiliation | Industry | Lin Chen, Michal Lukasik, Wittawat Jitkrittum, Chong You, Sanjiv Kumar Google Research, {linche,mlukasik,wittawat,cyou,sanjivk}@google.com |
| Pseudocode | No | The paper does not contain any sections, figures, or blocks explicitly labeled as 'Pseudocode' or 'Algorithm', nor are there any structured, code-like procedural steps presented. |
| Open Source Code | No | The paper does not contain any explicit statements about making its source code available, nor does it provide a link to a code repository for the methodology described. |
| Open Datasets | Yes | Res Net-50 trained on Image Net (Figure 1 caption); Res Net-56 (on CIFAR-10), Res Net-8 (on CIFAR-10), Res Net-50 (on Image Net), and Res Net-110 (on CIFAR-100) (Table 3); fine-tune BERT models on TREC the dataset (Section 3.2, and Appendix E.7). |
| Dataset Splits | No | The paper refers to using 'train set' and 'test samples' (e.g., 'bootstrapping of training set', 'over all test samples') and mentions datasets like CIFAR-10 and ImageNet, but it does not explicitly specify the proportions or methodology for train/validation/test splits for model training. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used to run its experiments, such as GPU models, CPU types, or cloud computing instance specifications. |
| Software Dependencies | No | The paper mentions using 'Adam' for training BERT models but does not provide specific version numbers for any software dependencies or libraries required to replicate the experiments. |
| Experiment Setup | Yes | In this experiment, each of the two ensembles consists of 20 BERT models. In each case, each of these models was initialized from the same pre-trained checkpoint, and trained for 20 epochs with learning rate of 2e-5 using Adam. We use a polynomial decay learning rate schedule with the number of warm-up steps set to be 10% of the number of total update steps. Training batch size was set to 8. |