reproducibilityindex.ai

Optimizing Neural Networks with Gradient Lexicase Selection

Authors: Li Ding, Lee Spector

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that the proposed method improves the generalization performance of various popular deep neural network architectures on three image classiﬁcation benchmarks. The proposed Gradient Lexicase Selection is tested on the task of image classiﬁcation
Researcher Affiliation	Academia	Li Ding University of Massachusetts Amherst liding@umass.edu Lee Spector Amherst College University of Massachusetts Amherst lspector@amherst.edu
Pseudocode	Yes	Algorithm 1: Gradient Lexicase Selection; Algorithm 2: Lexicase selection to select one parent program in genetic programming
Open Source Code	Yes	We submit our source code as the supplementary material for the review process, which can be used to reproduce the experimental results in this work. We also release our source code on Github: https://github.com/ld-ing/Gradient-Lexicase.
Open Datasets	Yes	Three benchmark datasets (CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009), and SVHN (Netzer et al., 2011)) are used for evaluation.
Dataset Splits	No	For the selection process, we do not hold out another validation set because 1) if we choose to use a validation set, the validation set should have an adequate size in order to ensure its diversity and generality, which means the training set will be noticeably smaller, and thus the training performance is likely to degrade; 2) since each model instance only gets access to part of the (augmented) training data, the selection performed on the original training data is still effective, since the exact same data was never used in the mutation (training).
Hardware Specification	No	The paper mentions 'modern cloud computing facilities' generally, but does not provide specific details on CPU, GPU, or other hardware components used for experiments.
Software Dependencies	No	The paper mentions using 'SGD with momentum' and standard deep learning frameworks implicitly. It does not list specific software dependencies with version numbers (e.g., Python version, PyTorch version, CUDA version).
Experiment Setup	Yes	The batch size is set to 128 for CIFAR-10 and 64 for CIFAR-100 and SVHN. The initial learning rate is set to 0.1 and tuned by using Cosine Annealing (Loshchilov & Hutter, 2017). We set the total number of epochs as 200 for baseline training and as 200(p+1) for gradient lexicase selection, where p is the size of population. For both baseline and lexicase, we use SGD with momentum of 0.9. For lexicase, we use the Reset Momentum option that re-initialize the momentum parameters for each epoch, which is explained in detail later in Sec. 5.2.