Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
On Bias-Variance Alignment in Deep Models
Authors: Lin Chen, Michal Lukasik, Wittawat Jitkrittum, Chong You, Sanjiv Kumar
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present empirical evidence confirming this phenomenon in a variety of deep learning models and datasets. Moreover, we study this phenomenon from two theoretical perspectives: calibration and neural collapse. We first show theoretically that under the assumption that the models are well calibrated, we can observe the bias-variance alignment. Second, starting from the picture provided by the neural collapse theory, we show an approximate correlation between bias and variance. Our main contributions are: (1) We conduct experiments to show that the bias-variance alignment holds for a variety of model architectures and on different datasets. |
| Researcher Affiliation | Industry | Lin Chen, Michal Lukasik, Wittawat Jitkrittum, Chong You, Sanjiv Kumar Google Research, EMAIL |
| Pseudocode | No | The paper does not contain any sections, figures, or blocks explicitly labeled as 'Pseudocode' or 'Algorithm', nor are there any structured, code-like procedural steps presented. |
| Open Source Code | No | The paper does not contain any explicit statements about making its source code available, nor does it provide a link to a code repository for the methodology described. |
| Open Datasets | Yes | Res Net-50 trained on Image Net (Figure 1 caption); Res Net-56 (on CIFAR-10), Res Net-8 (on CIFAR-10), Res Net-50 (on Image Net), and Res Net-110 (on CIFAR-100) (Table 3); fine-tune BERT models on TREC the dataset (Section 3.2, and Appendix E.7). |
| Dataset Splits | No | The paper refers to using 'train set' and 'test samples' (e.g., 'bootstrapping of training set', 'over all test samples') and mentions datasets like CIFAR-10 and ImageNet, but it does not explicitly specify the proportions or methodology for train/validation/test splits for model training. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used to run its experiments, such as GPU models, CPU types, or cloud computing instance specifications. |
| Software Dependencies | No | The paper mentions using 'Adam' for training BERT models but does not provide specific version numbers for any software dependencies or libraries required to replicate the experiments. |
| Experiment Setup | Yes | In this experiment, each of the two ensembles consists of 20 BERT models. In each case, each of these models was initialized from the same pre-trained checkpoint, and trained for 20 epochs with learning rate of 2e-5 using Adam. We use a polynomial decay learning rate schedule with the number of warm-up steps set to be 10% of the number of total update steps. Training batch size was set to 8. |