PEP: Parameter Ensembling by Perturbation

Authors: Alireza Mehrtash, Purang Abolmaesumi, Polina Golland, Tina Kapur, Demian Wassermann, William Wells

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on Image Net pre-trained networks including Res Net, Dense Net, and Inception showed improved calibration and likelihood. We further observed a mild improvement in classification accuracy on these networks. Experiments on classification benchmarks such as MNIST and CIFAR-10 showed improved calibration and likelihood, as well as the relationship between the PEP effect and overfitting; this demonstrates that PEP can be used to probe the level of overfitting that occurred during training.
Researcher Affiliation Academia Alireza Mehrtash1,2, Purang Abolmaesumi1, Polina Golland3, Tina Kapur2, Demian Wassermann4, William M. Wells III2,3 1ECE Department, University of British Columbia (UBC), Vancouver, BC 2Department of Radiology, BWH, Harvard Medical School, Boston, MA 3CSAIL, MIT, Boston, MA 4INRIA Saclay, Palaiseau, France
Pseudocode No The paper does not contain pseudocode or explicitly labeled algorithm blocks; it uses mathematical formulations to describe the method.
Open Source Code No The paper does not provide any specific link or statement about open-sourcing the code for the described methodology.
Open Datasets Yes We evaluated the performance of PEP using large scale networks that were trained on Image Net (ILSVRC2012) [40] dataset. The MNIST handwritten digits [27] and fashion MNIST [47] datasets consist of 60,000 training images and 10,000 test images. The CIFAR-10 and CIFAR-100 datasets [24] consists of 50,000 training images and 10,000 test images.
Dataset Splits Yes From the 50,000 images, 5,000 images were used as a validation set for optimizing σ in PEP, and temperature T in temperature scaling. The remaining 45,000 images were used as the test set. We created validation sets by setting aside 10,000 and 5,000 training images from MNIST (handwritten and fashion) and CIFAR, respectively.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory).
Software Dependencies No The paper mentions 'Keras library[4]' and 'Adam update rule [22]' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes The search range for σ was 5 10 5 5 10 3, ensemble size was 5 (m = 5), and number of iterations was 7. On the test set with 45,000 images, PEP was evaluated using σ and with ensemble size of 10 (m = 10). For optimization, stochastic gradient descent with the Adam update rule [22] was used. Each baseline was trained for 15 epochs.