reproducibilityindex.ai

EvoPrompting: Language Models for Code-Level Neural Architecture Search

Authors: Angelica Chen, David Dohan, David So

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We ﬁrst demonstrate that EVOPROMPTING is effective on the computationally efﬁcient MNIST-1D dataset, where EVOPROMPTING produces convolutional architecture variants that outperform both those designed by human experts and naive few-shot prompting in terms of accuracy and model size. We then apply our method to searching for graph neural networks on the CLRS Algorithmic Reasoning Benchmark, where EVOPROMPTING is able to design novel architectures that outperform current state-of-the-art models on 21 out of 30 algorithmic reasoning tasks while maintaining similar model size.
Researcher Affiliation	Collaboration	Angelica Chen New York University angelica.chen@nyu.edu David M. Dohan Open AI david@ddohan.com David R. So Jane Street david.r.so.ai@gmail.com
Pseudocode	Yes	Algorithm 1 Complete meta-learning evolutionary algorithm using pθ as a crossover and mutation operator. Algorithm 2 The crossover and mutation algorithm, CROSSMUT(πθt, P, m, k, n), where Uniform(P) denotes the uniform distribution over the set P. Algorithm 3 The algorithm for ﬁltering and scoring child models, FILTERANDEVAL(C, T , D, α).
Open Source Code	Yes	We refer the reader to Appendix A.2 for the source code of these seed models. ... See Appx. A.4 for the full code of each discovered model. ... Below we list the Python source code of ﬁve of the newly discovered GNNs. ... Below we provide the source code for the nine seed models used in the CLRS model search.
Open Datasets	Yes	We evaluate our meta-learning algorithm on two datasets MNIST-1D (Greydanus, 2020) and the CLRS algorithmic reasoning benchmark (Veliˇckovi c et al., 2022).
Dataset Splits	Yes	Since there is no validation dataset, we randomly set aside 500 examples from the training dataset to use as the validation dataset. ... and evaluated models using validation accuracy.
Hardware Specification	Yes	on a single NVIDIA Tesla P100 GPU
Software Dependencies	No	The paper mentions software like Flax, Haiku, and Jax, but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	Throughout the model search we use the Adam W optimizer (Loshchilov & Hutter, 2019) to train each child model on a single NVIDIA Tesla P100 GPU for 8000 steps, with learning rate 0.01 and batch size 128. ... Between each round, the model is prompt-tuned (Lester et al., 2021) for 5 epochs with a soft prompt length of 16, batch size of 16, and learning rate of 0.1 ... Unless stated otherwise, we run 10 rounds of evolution with 10 prompts per round and 16 samples generated per prompt