United We Stand: Using Epoch-Wise Agreement of Ensembles to Combat Overfit

Authors: Uri Stern, Daniel Shwartz, Daphna Weinshall

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We begin with the theoretical analysis of a regression model, whose prediction that the variance among classifiers increases when overfit occurs is demonstrated empirically in deep networks in common use. Guided by these results, we construct a new ensemble-based prediction method, where the prediction is determined by the class that attains the most consensual prediction throughout the training epochs. Using multiple image and text classification datasets, we show that when regular ensembles suffer from overfit, our method eliminates the harmful reduction in generalization due to overfit, and often even surpasses the performance obtained by early stopping.
Researcher Affiliation Academia Uri Stern, Daniel Shwartz, Daphna Weinshall School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 91904, Israel ustern@gmail.com, Daniel.Shwartz1@mail.huji.ac.il, daphna@mail.huji.ac.il
Pseudocode Yes Algorithm 1: max Agreement Prediction (MAP)
Open Source Code Yes Our code is available at https://github.com/uristern123/United-We-Stand-Using Epoch-wise-Agreement-of-Ensembles-to-Combat-Overfit.
Open Datasets Yes To evaluate our method with different levels of overfit we use image and text classification datasets with injected noisy labels (Cifar10/100, Tiny Imagenet, Imagenet100, MNLI, QNLI, QQP) and datasets with native label noise (Webvision50, Clothing1M, Animal10N).
Dataset Splits No The paper describes training on a 'training dataset' and testing on a 'test set', but does not explicitly provide details for a separate validation split (e.g., percentages or sample counts for training, validation, and test sets).
Hardware Specification No The paper does not specify any hardware details such as GPU models, CPU types, or cloud computing resources used for running the experiments.
Software Dependencies No The paper mentions general training techniques and components like 'Stochastic Gradient Descent (SGD)', 'data augmentation', 'batch normalization', and 'weight decay', but does not provide specific software names with version numbers (e.g., Python, PyTorch, TensorFlow versions) required for reproducibility.
Experiment Setup No The paper mentions that 'Training involves common methods known to reduce overfit, such as data augmentation, batch normalization and weight decay', but it does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or detailed training configurations required for replicating the experiments.