reproducibilityindex.ai

On Bias-Variance Alignment in Deep Models

Authors: Lin Chen, Michal Lukasik, Wittawat Jitkrittum, Chong You, Sanjiv Kumar

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present empirical evidence conﬁrming this phenomenon in a variety of deep learning models and datasets. Moreover, we study this phenomenon from two theoretical perspectives: calibration and neural collapse. We ﬁrst show theoretically that under the assumption that the models are well calibrated, we can observe the bias-variance alignment. Second, starting from the picture provided by the neural collapse theory, we show an approximate correlation between bias and variance. Our main contributions are: (1) We conduct experiments to show that the bias-variance alignment holds for a variety of model architectures and on different datasets.
Researcher Affiliation	Industry	Lin Chen, Michal Lukasik, Wittawat Jitkrittum, Chong You, Sanjiv Kumar Google Research, {linche,mlukasik,wittawat,cyou,sanjivk}@google.com
Pseudocode	No	The paper does not contain any sections, figures, or blocks explicitly labeled as 'Pseudocode' or 'Algorithm', nor are there any structured, code-like procedural steps presented.
Open Source Code	No	The paper does not contain any explicit statements about making its source code available, nor does it provide a link to a code repository for the methodology described.
Open Datasets	Yes	Res Net-50 trained on Image Net (Figure 1 caption); Res Net-56 (on CIFAR-10), Res Net-8 (on CIFAR-10), Res Net-50 (on Image Net), and Res Net-110 (on CIFAR-100) (Table 3); fine-tune BERT models on TREC the dataset (Section 3.2, and Appendix E.7).
Dataset Splits	No	The paper refers to using 'train set' and 'test samples' (e.g., 'bootstrapping of training set', 'over all test samples') and mentions datasets like CIFAR-10 and ImageNet, but it does not explicitly specify the proportions or methodology for train/validation/test splits for model training.
Hardware Specification	No	The paper does not provide any specific details regarding the hardware used to run its experiments, such as GPU models, CPU types, or cloud computing instance specifications.
Software Dependencies	No	The paper mentions using 'Adam' for training BERT models but does not provide specific version numbers for any software dependencies or libraries required to replicate the experiments.
Experiment Setup	Yes	In this experiment, each of the two ensembles consists of 20 BERT models. In each case, each of these models was initialized from the same pre-trained checkpoint, and trained for 20 epochs with learning rate of 2e-5 using Adam. We use a polynomial decay learning rate schedule with the number of warm-up steps set to be 10% of the number of total update steps. Training batch size was set to 8.