Efficient Activation Function Optimization through Surrogate Modeling
Authors: Garrett Bingham, Risto Miikkulainen
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | First, the benchmark datasets Act-Bench-CNN, Act-Bench-Res Net, and Act-Bench-Vi T were created by training convolutional, residual, and vision transformer architectures from scratch with 2,913 systematically generated activation functions." and "In the third step, this surrogate was evaluated experimentally, first by verifying that it can discover known good functions in the benchmark datasets efficiently and reliably, and second by demonstrating that it can discover improved activation functions in new tasks involving different datasets, search spaces, and architectures. |
| Researcher Affiliation | Collaboration | Garrett Bingham The University of Texas at Austin and Cognizant AI Labs San Francisco, CA 94105 garrett@gjb.ai" and "Risto Miikkulainen The University of Texas at Austin and Cognizant AI Labs San Francisco, CA 94105 risto@cs.utexas.edu |
| Pseudocode | No | No pseudocode or algorithm blocks are provided in the paper. |
| Open Source Code | Yes | AQua Sur F code is available at https://github.com/cognizant-ai-labs/aquasurf |
| Open Datasets | Yes | First, the benchmark datasets Act-Bench-CNN, Act-Bench-Res Net, and Act-Bench-Vi T were created by training convolutional, residual, and vision transformer architectures from scratch with 2,913 systematically generated activation functions." and "The benchmark collections are made available at https://github.com/cognizant-ai-labs/act-bench" and "All-CNN-C on CIFAR-10, Res Net-56 on CIFAR-10, and Mobile Vi Tv2-0.5 on Imagenette [22, 24, 31, 41, 51]. |
| Dataset Splits | Yes | For CIFAR-10 and CIFAR-100, balanced validation sets were created by sampling 5,000 images from the training set." and "Full training details and hyperparameters are listed in Tables 5 and 6. |
| Hardware Specification | Yes | The experiments in this paper were implemented using an AWS g5.48xlarge instance with eight NVIDIA A10G GPUs. |
| Software Dependencies | No | The algorithms were used out of the box with default hyperparameters from the scikit-learn package [47]. |
| Experiment Setup | Yes | Full training details and hyperparameters are listed in Tables 5 and 6. |