reproducibilityindex.ai

PEP: Parameter Ensembling by Perturbation

Authors: Alireza Mehrtash, Purang Abolmaesumi, Polina Golland, Tina Kapur, Demian Wassermann, William Wells

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on Image Net pre-trained networks including Res Net, Dense Net, and Inception showed improved calibration and likelihood. We further observed a mild improvement in classiﬁcation accuracy on these networks. Experiments on classiﬁcation benchmarks such as MNIST and CIFAR-10 showed improved calibration and likelihood, as well as the relationship between the PEP effect and overﬁtting; this demonstrates that PEP can be used to probe the level of overﬁtting that occurred during training.
Researcher Affiliation	Academia	Alireza Mehrtash1,2, Purang Abolmaesumi1, Polina Golland3, Tina Kapur2, Demian Wassermann4, William M. Wells III2,3 1ECE Department, University of British Columbia (UBC), Vancouver, BC 2Department of Radiology, BWH, Harvard Medical School, Boston, MA 3CSAIL, MIT, Boston, MA 4INRIA Saclay, Palaiseau, France
Pseudocode	No	The paper does not contain pseudocode or explicitly labeled algorithm blocks; it uses mathematical formulations to describe the method.
Open Source Code	No	The paper does not provide any specific link or statement about open-sourcing the code for the described methodology.
Open Datasets	Yes	We evaluated the performance of PEP using large scale networks that were trained on Image Net (ILSVRC2012) [40] dataset. The MNIST handwritten digits [27] and fashion MNIST [47] datasets consist of 60,000 training images and 10,000 test images. The CIFAR-10 and CIFAR-100 datasets [24] consists of 50,000 training images and 10,000 test images.
Dataset Splits	Yes	From the 50,000 images, 5,000 images were used as a validation set for optimizing σ in PEP, and temperature T in temperature scaling. The remaining 45,000 images were used as the test set. We created validation sets by setting aside 10,000 and 5,000 training images from MNIST (handwritten and fashion) and CIFAR, respectively.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory).
Software Dependencies	No	The paper mentions 'Keras library[4]' and 'Adam update rule [22]' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	The search range for σ was 5 10 5 5 10 3, ensemble size was 5 (m = 5), and number of iterations was 7. On the test set with 45,000 images, PEP was evaluated using σ and with ensemble size of 10 (m = 10). For optimization, stochastic gradient descent with the Adam update rule [22] was used. Each baseline was trained for 15 epochs.