Learning Transformer Programs
Authors: Dan Friedman, Alexander Wettig, Danqi Chen
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate our approach, we learn Transformer Programs for a variety of problems, including an in-context learning task, a suite of algorithmic problems (e.g. sorting, recognizing Dyck-languages), and NLP tasks including named entity recognition and text classification. The Transformer Programs can automatically find reasonable solutions, performing on par with standard Transformers of comparable size; and, more importantly, they are easy to interpret. |
| Researcher Affiliation | Academia | Dan Friedman Alexander Wettig Danqi Chen Department of Computer Science & Princeton Language and Intelligence Princeton University {dfriedman,awettig,danqic}@cs.princeton.edu |
| Pseudocode | No | The paper includes Python code snippets for the learned Transformer Programs, but it does not provide pseudocode or algorithm blocks for the method of learning Transformer Programs itself. |
| Open Source Code | Yes | Our code is available at https://github.com/princeton-nlp/Transformer Programs, along with a number of example Transformer Programs. |
| Open Datasets | Yes | We validate our approach by learning Transformer Programs for a variety of problems, including an in-context learning task; the set of algorithmic problems introduced by Weiss et al. [2021]; and NLP benchmarks for named entity recognition and text classification. CoNLL-2003 Named Entity Recognition task [Sang and De Meulder, 2003] using the distribution from Hugging Face Datasets [Lhoest et al., 2021]. |
| Dataset Splits | Yes | For each RASP task, we sample 20,000 inputs without replacement and partition them into train, validation, and test sets containing 16,000/2,000/2,000 instances respectively. We use the standard train/validation/test split and evaluate the results using a Python implementation of the standard CoNLL evaluation script [Nakayama, 2018]. |
| Hardware Specification | Yes | Each model takes between five and fifteen minutes to train on an Nvidia RTX 2080 GPU, depending on the number of layers. |
| Software Dependencies | No | The paper mentions implementing models in PyTorch [Paszke et al., 2019], but does not provide a specific version number for PyTorch or other software dependencies. |
| Experiment Setup | Yes | We train each model for 250 epochs with a batch size of 512, a learning rate of 0.05, and annealing the Gumbel temperature geometrically from 3.0 to 0.01, decreasing the temperature at each training step. |