EvoPrompting: Language Models for Code-Level Neural Architecture Search
Authors: Angelica Chen, David Dohan, David So
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first demonstrate that EVOPROMPTING is effective on the computationally efficient MNIST-1D dataset, where EVOPROMPTING produces convolutional architecture variants that outperform both those designed by human experts and naive few-shot prompting in terms of accuracy and model size. We then apply our method to searching for graph neural networks on the CLRS Algorithmic Reasoning Benchmark, where EVOPROMPTING is able to design novel architectures that outperform current state-of-the-art models on 21 out of 30 algorithmic reasoning tasks while maintaining similar model size. |
| Researcher Affiliation | Collaboration | Angelica Chen New York University angelica.chen@nyu.edu David M. Dohan Open AI david@ddohan.com David R. So Jane Street david.r.so.ai@gmail.com |
| Pseudocode | Yes | Algorithm 1 Complete meta-learning evolutionary algorithm using pθ as a crossover and mutation operator. Algorithm 2 The crossover and mutation algorithm, CROSSMUT(πθt, P, m, k, n), where Uniform(P) denotes the uniform distribution over the set P. Algorithm 3 The algorithm for filtering and scoring child models, FILTERANDEVAL(C, T , D, α). |
| Open Source Code | Yes | We refer the reader to Appendix A.2 for the source code of these seed models. ... See Appx. A.4 for the full code of each discovered model. ... Below we list the Python source code of five of the newly discovered GNNs. ... Below we provide the source code for the nine seed models used in the CLRS model search. |
| Open Datasets | Yes | We evaluate our meta-learning algorithm on two datasets MNIST-1D (Greydanus, 2020) and the CLRS algorithmic reasoning benchmark (Veliˇckovi c et al., 2022). |
| Dataset Splits | Yes | Since there is no validation dataset, we randomly set aside 500 examples from the training dataset to use as the validation dataset. ... and evaluated models using validation accuracy. |
| Hardware Specification | Yes | on a single NVIDIA Tesla P100 GPU |
| Software Dependencies | No | The paper mentions software like Flax, Haiku, and Jax, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | Throughout the model search we use the Adam W optimizer (Loshchilov & Hutter, 2019) to train each child model on a single NVIDIA Tesla P100 GPU for 8000 steps, with learning rate 0.01 and batch size 128. ... Between each round, the model is prompt-tuned (Lester et al., 2021) for 5 epochs with a soft prompt length of 16, batch size of 16, and learning rate of 0.1 ... Unless stated otherwise, we run 10 rounds of evolution with 10 prompts per round and 16 samples generated per prompt |