Efficient Activation Function Optimization through Surrogate Modeling

Authors: Garrett Bingham, Risto Miikkulainen

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental First, the benchmark datasets Act-Bench-CNN, Act-Bench-Res Net, and Act-Bench-Vi T were created by training convolutional, residual, and vision transformer architectures from scratch with 2,913 systematically generated activation functions." and "In the third step, this surrogate was evaluated experimentally, first by verifying that it can discover known good functions in the benchmark datasets efficiently and reliably, and second by demonstrating that it can discover improved activation functions in new tasks involving different datasets, search spaces, and architectures.
Researcher Affiliation Collaboration Garrett Bingham The University of Texas at Austin and Cognizant AI Labs San Francisco, CA 94105 garrett@gjb.ai" and "Risto Miikkulainen The University of Texas at Austin and Cognizant AI Labs San Francisco, CA 94105 risto@cs.utexas.edu
Pseudocode No No pseudocode or algorithm blocks are provided in the paper.
Open Source Code Yes AQua Sur F code is available at https://github.com/cognizant-ai-labs/aquasurf
Open Datasets Yes First, the benchmark datasets Act-Bench-CNN, Act-Bench-Res Net, and Act-Bench-Vi T were created by training convolutional, residual, and vision transformer architectures from scratch with 2,913 systematically generated activation functions." and "The benchmark collections are made available at https://github.com/cognizant-ai-labs/act-bench" and "All-CNN-C on CIFAR-10, Res Net-56 on CIFAR-10, and Mobile Vi Tv2-0.5 on Imagenette [22, 24, 31, 41, 51].
Dataset Splits Yes For CIFAR-10 and CIFAR-100, balanced validation sets were created by sampling 5,000 images from the training set." and "Full training details and hyperparameters are listed in Tables 5 and 6.
Hardware Specification Yes The experiments in this paper were implemented using an AWS g5.48xlarge instance with eight NVIDIA A10G GPUs.
Software Dependencies No The algorithms were used out of the box with default hyperparameters from the scikit-learn package [47].
Experiment Setup Yes Full training details and hyperparameters are listed in Tables 5 and 6.