Initial Guessing Bias: How Untrained Networks Favor Some Classes

Authors: Emanuele Francazi, Aurelien Lucchi, Marco Baity-Jesi

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide empirical evidence of the emergence of IGB in a broader range of practical scenarios, including real data, and a wide spectrum of architectures (e.g., CNNs, Res Nets, Vision Transformers), demonstrating the prevalence of IGB.
Researcher Affiliation Academia 1Physics Department, EPFL, Switzerland 2SIAM Department, Eawag, Switzerland 3Department of Mathematics and Computer Science, University of Basel, Switzerland. Correspondence to: Emanuele Francazi <emanuele.francazi@epfl.ch>.
Pseudocode No No pseudocode or algorithm blocks are explicitly presented in the paper. The methodology is described through mathematical equations and prose.
Open Source Code Yes The code used for the experiments presented in this work are available at https://github.com/Emanuele Francazi/IGB-Algorithms.
Open Datasets Yes CIFAR10 (C10): We use CIFAR10 (https://www.cs.toronto.edu/ kriz/cifar.html) (Krizhevsky et al., 2009) as an example of a real multi-class dataset. CIFAR100 (C100): We use CIFAR100 (https://www.cs.toronto.edu/ kriz/cifar. html) (Krizhevsky et al., 2009) as an example of high cardinality dataset, i.e. a dataset with a big number of classes. MNIST (E&O): We use MNIST (http://yann.lecun.com/exdb/mnist/) (Deng, 2012) to reproduce binary experiments on real data.
Dataset Splits No The paper uses standard datasets like CIFAR10 and MNIST but does not explicitly provide the specific percentages or sample counts for training, validation, and test splits within its main text.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running the experiments.
Software Dependencies No The paper does not list specific version numbers for software dependencies such as Python, PyTorch, or CUDA, which would be necessary for full reproducibility.
Experiment Setup No The paper refers to 'settings proposed in their respective repositories' for the dynamics simulations, but it does not explicitly provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations within the main text.