Optimizing Neural Networks with Gradient Lexicase Selection
Authors: Li Ding, Lee Spector
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that the proposed method improves the generalization performance of various popular deep neural network architectures on three image classification benchmarks. The proposed Gradient Lexicase Selection is tested on the task of image classification |
| Researcher Affiliation | Academia | Li Ding University of Massachusetts Amherst liding@umass.edu Lee Spector Amherst College University of Massachusetts Amherst lspector@amherst.edu |
| Pseudocode | Yes | Algorithm 1: Gradient Lexicase Selection; Algorithm 2: Lexicase selection to select one parent program in genetic programming |
| Open Source Code | Yes | We submit our source code as the supplementary material for the review process, which can be used to reproduce the experimental results in this work. We also release our source code on Github: https://github.com/ld-ing/Gradient-Lexicase. |
| Open Datasets | Yes | Three benchmark datasets (CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009), and SVHN (Netzer et al., 2011)) are used for evaluation. |
| Dataset Splits | No | For the selection process, we do not hold out another validation set because 1) if we choose to use a validation set, the validation set should have an adequate size in order to ensure its diversity and generality, which means the training set will be noticeably smaller, and thus the training performance is likely to degrade; 2) since each model instance only gets access to part of the (augmented) training data, the selection performed on the original training data is still effective, since the exact same data was never used in the mutation (training). |
| Hardware Specification | No | The paper mentions 'modern cloud computing facilities' generally, but does not provide specific details on CPU, GPU, or other hardware components used for experiments. |
| Software Dependencies | No | The paper mentions using 'SGD with momentum' and standard deep learning frameworks implicitly. It does not list specific software dependencies with version numbers (e.g., Python version, PyTorch version, CUDA version). |
| Experiment Setup | Yes | The batch size is set to 128 for CIFAR-10 and 64 for CIFAR-100 and SVHN. The initial learning rate is set to 0.1 and tuned by using Cosine Annealing (Loshchilov & Hutter, 2017). We set the total number of epochs as 200 for baseline training and as 200(p+1) for gradient lexicase selection, where p is the size of population. For both baseline and lexicase, we use SGD with momentum of 0.9. For lexicase, we use the Reset Momentum option that re-initialize the momentum parameters for each epoch, which is explained in detail later in Sec. 5.2. |